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Abstract — It has recently been observed that the permanent of 
a non-negative square matrix, i.e., of a square matrix containing 
only non-negative real entries, can very well be approximated 
by solving a certain Bethe free energy function minimization 
problem with the help of the sum-product algorithm. We call the 
resulting approximation of the permanent the Bethe permanent. 

In this paper we give reasons why this approach to approx- 
imating the permanent works well. Namely, we show that the 
Bethe free energy function is convex and that the sum-product 
algorithm finds its minimum efficiently. We also discuss the fact 
that the permanent is lower bounded by the Bethe permanent, 
and we comment on potential upper bounds on the permanent 
based on the Bethe permanent. We also present a combinatorial 
characterization of the Bethe permanent in terms of permanents 
of so-called lifted versions of the matrix under consideration. 

Moreover, we comment on possibilities to modify the Bethe per- 
manent so that it approximates the permanent even better, and 
we conclude the paper with some conjectures about permanent- 
based pseudo-codewords and permanent-based kernels. 

Index Terms — Bethe approximation, Bethe permanent, graph 
cover, partition function, perfect matching, permanent, sum- 
product algorithm. 



I. Introduction 

Central to the topic of this paper is the definition of the 
permanent of a square matrix (see, e.g., [1]). 



Definition 1 Let 6 ~ {di,j)i.j be a real matrix of size n x n. 
The permanent of 6 is defined to be the scalar 



perm(0) = X! IT ^^^W' 



(1) 



where the summation is over all n\ permutations of the set 



^] "{l,2,...,n}. 
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Contrast this definition with the definition of the determi- 
nant of 6, i.e., 

dct(0) = ^ sgn(f7) II 0,,,(,), 

o" ie[n] 

where sgn(fT) equals +1 if cr is an even permutation and equals 
— 1 if tr is an odd permutation. 
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A. Complexity of Computing the Permanent 

Because the definition of the permanent looks simpler than 
the definition of the determinant, it is tempting to conclude 
that the permanent can be computed at least as efficiently 
as the determinant. However, this does not seem to be the 
case. Namely, whereas the arithmetic complexity (number of 
real additions and multiplications) needed to compute the 
determinant is in 0{ii?), Ryser's algorithm (one of the most 
efficient algorithms for computing the permanent) requires 
0(n-2") arithmetic operations [2]. This clearly improves upon 
the brute-force complexity 0(n ■ n\) = 0{n^^^ ■ in/eY^') for 
computing the permanent, but is still exponential in the matrix 
size. 

In terms of complexity classes, the computation of the 
permanent is in the complexity class #P ("sharp P" or "number 
P") [3], where #P is the set of the counting problems associated 
with the decision problems in the class NP. Note that even 
the computation of the permanent of matrices that contain 
only zeros and ones is #P-complete. Therefore, the above- 
mentioned complexity numbers for the computation of the 
permanent are not surprising. 

B. Approximations to the Permanent 

Given the difficulty of computing the permanent exactly, 
and given the fact that in many applications it is good enough 
to compute an approximation to the permanent, this paper 
focuses on efficient methods to approximate the permanent. 
This relaxation in requirements, from exact to approximate 
evaluation of the permanent, allows one to devise algorithms 
that potentially have much lower complexity. 

Moreover, we will consider only the case where the matrix 
in (1) is non-negative, i.e., where all entries of are non- 
negative. It is to be expected that approximating the permanent 
is simpler in this case because with this restriction the sum 
in (1) contains only non-negative terms, i.e., the terms in this 
sum "interfere constructively." This is in contrast to the general 
case where the sum in (1) contains positive and negative 
terms, i.e., the terms in this sum "interfere constructively 
and destructively."' Despite this restriction to non-negative 
matrices, many interesting counting problems can be captured 
by this setup. 

Earlier work on approximating the permanent of a non- 
negative matrix includes Markov-chain-Monte-Carlo-based 
methods by Broder (see [4]), fully polynomial-time random- 
ized approximation schemes (FPRAS) [5], [6] (for more de- 
tails, in particular complexity estimates of these methods, see 
for example the discussion in [6]) and Bethe-approximation- 
based / sum-product-algorithm (SPA) based methods [7], [8]. 

'strictly speaking, there are also matrices with positive and negative 
entries but where the product nigral ^i^^ii) '^ non-negative for every a. 



The study in this paper was very much motivated by this last 
set of papers on graphical-model-based methods, in particular 
by the fact these methods yield algorithms that are very 
efficient and by the fact that the obtained permanent estimates 
have an accuracy that is good enough for many purposes. 

The main idea behind this graphical-model-based approach 
is to formulate a factor graph whose partition function equals 
the permanent that we are looking for. Consequently, the 
negative logarithm of the permanent equals the minimum of 
the so-called Gibbs free energy function that is associated with 
this factor graph. Although being an elegant reformulation 
of the permanent computation problem, this does not yet 
yield any computational savings. Nevertheless, it suggests to 
look for a function that is tractable and whose minimum is 
close to the minimum of the Gibbs free energy function. One 
such function is the so-called Bethe free energy function [9], 
and with this, paralleling the above-mentioned relationship 
between the permanent and the minimum of the Gibbs free 
energy function, the Bethe permanent is defined such that its 
negative logarithm equals the global minimum of the Bethe 
free energy function. The Bethe free energy function is an 
interesting candidate because a theorem by Yedidia, Freeman, 
and Weiss [9] says that fixed points of the SPA correspond to 
stationary points of the Bethe free energy function. 

In general, this approach of replacing the Gibbs free energy 
function by the Bethe free energy function comes with very 
few guarantees, though. 

• The Bethe free energy function might have multiple local 
minima. 

• It is unclear how close the (global) minimum of the Bethe 
free energy function is to the minimum of the Gibbs free 
energy function. 

• It is unclear if the SPA converges, even to a local 
minimum of the Bethe free energy function. (As we will 
see, the factor graph that we use {cf. Figure 1) is not 
sparse and has many short cycles, in particular many four- 
cycles. These facts might suggest that the application of 
the SPA to this factor graph is rather problematic.) 

Luckily, in the case of the permanent approximation problem, 
one can formulate a factor graph where the Bethe free energy 
function is very well behaved. In particular, in this paper we 
discuss a factor graph that has the following properties. 

• We show that the Bethe free energy function is convex 
and therefore has no non-global local minima. 

• The minimum of the Bethe free energy function is quite 
close to the minimum of the Gibbs free energy function. 
Namely, as was recently shown by Gurvits [10], the 
permanent is lower bounded by the Bethe permanent. 
Moreover, we list conjectures on strict and probabilistic 
Bethe permanent based upper bounds on the permanent. 
In particular, for certain classes of square non-negative 
matrices, empirical evidence suggests that the permanent 
is upper bounded by some constant (that grows rather 
modestly with the matrix size) times the Bethe perma- 
nent. 

• We show that the SPA finds the minimum of the Bethe 
free energy function. In fact, the error between the 



iteration-dependent estimate of the Bethe permanent and 
the Bethe permanent itself decays exponentially fast, with 
an exponent depending on the matrix 6. Interestingly 
enough, in the associated convergence analysis a key role 
is played by a certain Markov chain that maximizes the 
sum of its entropy rate plus some average state transition 
cost. 
Besides leaving some questions open with respect to (w.rt.) 
the Bethe free energy function (see, e.g., the above-mentioned 
conjectures concerning permanent upper bounds), these results 
by-and-large validate the empirical success, as observed by 
Chertkov et al. [7] and by Huang and Jebara [8], of approxi- 
mating the permanent by graphical-model-based methods. 

Let us remark that for many factor graphs with cycles the 
Bethe free energy function is not as well behaved as the 
Bethe free energy function under consideration in this paper 
In particular, as discussed in [11], every code picked from 
an ensemble of regular low-density parity-check codes [12], 
where the ensemble is such that the minimum Hamming 
distance grows (with high probability) linearly with the block 
length, has a Bethe free energy function that is concave in 
certain regions of its domain. Nevertheless, decoding such 
codes with SPA-based decoders has been highly successful 
(see, e.g. [13]). 

C. Related Work 

The literature on permanents (and adjacent areas of counting 
perfect matchings, counting zero/one matrices with specified 
row and column sums, etc..) is vast. Therefore, we just mention 
works that are (to the best of our knowledge) the most relevant 
to the present paper 

Besides the already mentioned papers [7], [8] on Bethe- 
approximation-based methods to the permanent of a non- 
negative matrix, some aspects of the Bethe free energy func- 
tion were analyzed by Watanabe and Chertkov in [14] and by 
Chertkov et al. in [15]. (In particular, the paper [14] applied 
the loop calculus technique by Chertkov and Chernyak [16].) 
Very recent work in that line of research is presented in a 
paper by A. B. Yedidia and Chertkov [17] that studies so- 
called fractional free energy functionals, and resulting lower 
and upper bounds on the permanent of a non-negative matrix. 

Because computing the permanent is related to counting 
perfect matchings, the paper by Bayati and Nair [18] on 
counting matchings in graphs with the help of the SPA is 
very relevant. Note that their setup is such that the perfect 
matching case can be seen as a limiting case (namely the zero- 
temperature limit) of the matching setup. However, for the 
perfect matching case (a case for which the authors of [18] 
make no claims) the convergence proof of the SPA in [18] 
is incomplete. Moreover, their matchings are weighted only 
inasmuch as the weight of a matching depends on the size 
of the matching. Consequently, because all perfect matchings 
have the same size, they all are assigned the same weight. 

Very relevant to the present paper are also papers on max- 
product algorithm / min-sum algorithm based approaches to 
the maximum weight perfect matching problem [19]-[21]. 
As shown in these papers, these algorithms find the desired 



solution efficiently, a fact which is strongly related to the 
observation that the linear programming relaxation of the 
underlying integer linear program is tight. This tightness in 
relaxation, which is an immediate consequence of a theorem 
by Birkhoff and von Neumann (see Theorem 3), goes also a 
long way towards explaining why the Bethe free energy func- 
tion under consideration in this paper is well behaved. Finally, 
let us remark that because the difference between two perfect 
matchings corresponds to a union of disjoint cycles, the max- 
product algorithm / min-sum algorithm convergence analysis 
in [19]-[21] has some resemblance with Wiberg's max-product 
algorithm / min-sum algorithm convergence analysis for so- 
called cycle codes [22]. 

The present paper has also some similarities with recent 
papers by Barvinok on counting zero/one matrices with pre- 
scribed row and column sums [23] and by Barvinok and 
Samorodnitsky on computing the partition function for perfect 
matchings in hypergraphs [24]. However, these papers pursue 
what would be called a mean-field theory approach in the 
physics literature [25]. An exception to the previous statement 
is Section 3.2 in [23], which contains Bethe-approximation- 
type computations. (See the references in that section for 
further papers that investigate similar approaches.) 

Finally, as already mentioned in the previous subsection, 
Gurvits's recent paper [10] contains important observations 
w.rt. the relationship between the permanent and the Bethe 
permanent of a non-negative matrix, and puts them into the 
context of Schrijver's permanental inequality. 

D. Overview of the Paper 

This paper is structured as follows. We conclude this intro- 
ductory section with a discussion of some of the notation that 
is used. In Section II we then introduce the main normal factor 
graph (NFG) for this paper, in Section III we formally define 
the Bethe permanent, in Section IV we discuss properties of 
the Bethe entropy function and the Bethe free energy function, 
in Section V we analyze the SPA, in Section VI we give 
a "combinatorial characterization" of the Bethe permanent 
in terms of graph covers of the above-mentioned NFG, in 
Section VII we discuss Bethe-permanent-based bounds on the 
permanent, in Section VIII we list some thoughts on using 
the concept of the "fractional Bethe entropy function," in 
Section IX we list some conjectures, and we conclude the 
paper in Section X. Finally, the appendix contains some of 
the proofs. 



Assumption 2 Throughout this paper, if not mentioned other- 
wise, n is a positive integer and 6 = {Oi.j)i.j is a non-negative 
matrix of size n x n. D 

We use calligraphic letters for sets, and the size of a set 
S is denoted by \S\. The convex hull [26] of some subset S 
of some multi-dimensional real space is denoted by conv(5). 
For any positive integer L we define [L] = {1, . . . , L}. For 
any positive integer L, we define Vlxl to be the set of all 
L X L permutation matrices, i.e.. 



Vl 



xL 



P is a matrix of size L x L 
P contains exactly one 1 per row 
P contains exactly one 1 per column 
P contains Os otherwise 



Clearly, there is a bijection between Vlxl and the set of all 
permutations of [L]. Moreover, for a finite set S, we define 
n^ to be the set of probability mass functions over S, i.e., 



T^s = <P= (Ps 



ses 



Ps ^ for all s e 



ses ) 



Finally, for any positive integer L, we let F^xl be the set of 
doubly stochastic matrices of size L x L, i.e.. 



LxL 



1 



\r^^,i 



Jij ^ for all {i,j) e [L] x [L] 
EjG[L]7»j =1 for all ie [L] 
J2^e[L] 7»j- = 1 for all j e [L] 



In the rest of the paper, when appropriate, we will identify 
the set of L X L real matrices with the L^-dimensional real 
space. In that sense, Tlxl can be seen as a polytope in the 
i^-dimensional real space. Clearly, Tlxl is a convex set, 
and every permutation matrix of size i x L is a doubly 
stochastic matrix of size LxL. Most interestingly, every 
doubly stochastic matrix of size LxL can be written as a 
convex combination of permutation matrices of size LxL; 
this observation is a consequence of the important Birkhoff- 
von Neumann Theorem. 



Theorem 3 (Birkhoff-von Neumann Theorem) For any 

positive integer L, the set of doubly stochastic matrices of 
size L X L is a polytope whose vertex set equals the set of 
permutation matrices of size LxL, i.e.. 



E. Basic Notations and Definitions 

This subsection discusses the most important notations that 
will be used in this paper More notational definitions will be 
given in later sections. 

We let K be the field of real numbers, R^o be the set of non- 
negative real numbers, R>o be the set of positive real number, 
Z be the ring of integers, Z^o be the set of non-negative 
integers, and Z>o be the set of positive integers. Scalars 
are denoted by non-boldface characters, whereas vectors and 
matrices by boldface characters. For any positive integer L, 
the matrix Ilxl is the all-one matrix of size LxL. 



vertex-set(rLxL) = Vl 



xL- 



As a consequence, the set of doubly stochastic matrices of size 
L X L is the convex hull of the set of all permutation matrices 
of size LxL, i.e., 



'^LxL^ conv{V lxl) 
Proof: See, e.g., [27, Section 8.7]. 
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Finally, all logarithms will be natural logarithms and the 
value of • log(O) is defined to be equal to 0. 



II. Normal Factor Graph Representation 

Factor graphs are a convenient way to represent multivariate 
functions [28]. In this paper we use a variant called "normal 
factor graphs (NFGs)" [29] (also called "Forney-style factor 
graphs" [30]), where variables are associated with edges. 

As already mentioned in the introduction, the main idea 
behind the graphical-model-based approach to estimating the 
permanent is to formulate an NFG such that its partition 
function equals the permanent. There are of course different 
ways to do this and typically different formulations will yield 
different results when estimating the permanent with sub- 
optimal algorithms like the SPA. It is well known that when 
the NFG has no cycles, then the SPA computes the partition 
function exactly, however, for the given problem any NFG 
without cycles yields highly inefficient SPA update rules for 
reasonably large n (otherwise there would be a contradiction 
to the considerations in Section I-A), and so we will focus on 
NFGs with cycles. The NFG that is introduced in the following 
definition and that is based on a complete bipartite graph with 
two times n vertices, is a rather natural candidate, and, as we 
will see, has very interesting and useful properties. 




Fig. 1. The NFG N(0) which is based on a complete bipartite graph with two 
times n vertices (here n = 5). The left function nodes represent the functions 
{9i}iel' the right function nodes represent the functions {Sjljg j, and with 
the edge e = (i,j) we associate the variable Ae = Aij. (See Definition 4 
for more details.) 



Similarly, for every j € JJ we define the local functions 



9] 



n-4.. 







(if aj = Ui) 
(otherwise) 



For every i € T we define the function node alphabet A^i 
to be the set 



Definition 4 We define the NFG N(0) 
follows (see also Figure 1). 

• The set of vertices is T = X (j J , where X = [n] will be 
called the set of left vertices and J = [n] will be called 
the set of right vertices? 

• The set of edges is £ = I x J ^ {{i, j) | z G I, j G J'}. 
Moreover, all edges are full edges, i.e., £fuii = £ and 

i^half = { }■ 

• With every edge e ~ {hj) ^ £ we associate the variable 



\<i{T,£,A,g) as A,= la^C,\[A 



i,j' 



J' 



g,ia,)^0} ={uj \jej}. 



Similarly, for every j € JT we define the function node 
alphabet Aj to be the set 



A,^L,el[A 



.9,K)^0 =K \tel}. 



A, 



Ai_j with alphabet Ae 



-^'i,' 



{0,1}; a 



realization of Ag — Aij will be denoted by a, 



(The sets Ai and Aj are also known as local constraint 
code of the function nodes i and j, respectively.) 
The global function g is defined to be 



will be called the 



The set A = YlgAe = n«,j A J 
configuration set, and so 

a = (ae)ee£ = (ajj)(i,j)eix j G A 

will be called a configuration. For a given vector a, we 
also define the sub-vectors 

ai = {{aij)j&j} and aj = {(aij)jgx}- 

When convenient, the vector a will be considered to be 
an n X n matrix. Then ai corresponds to the i-th row of 
a, and aj corresponds to the j-th column of a. 
For every i G I we define the local functions^ ^ 



g: A 



a^ iY[g^{a^)j ■ Ijjgjiaj) 



9^ 



IIA 

j' 



ai H' 



(if a J = Uj) 
(otherwise) 



A configuration c with g{c) ^ will be called a valid 
configuration. The set of all valid configurations, i.e., 

{c,j e A J, (i,j) ^1x J 
{ci^j)i.jeixj Cj e Ai, i e I 
cj G Aj, jej 

will be called the global behavior of N(0). Considering 
the elements of Cg as n x n matrices, it can easily be 
verified that Cg ~ Vnxn- This allows us to associate with 
c £ C£ the permutation a^ : [n] — >■ [n] that maps i € I 



to j e J if c, J = 1. 
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-Here, T = Xy} J stands for the more cumbersome T = ({left} X X) U 
({right} X J'). In the following, i (and variations thereof) will refer to a left 
vertex and j (and variations thereof) will refer to a right vertex. In that spirit, 
variables like r)i and r)j are different variables, also if i = j. 

^^Here and in the following, Uj, j G J', stands for the length-n vector 
where all entries are zero except for the j'-th entry that equals 1. The vector 
Ui, i £ I, is defined similarly. 

*Here and in the following, we will use the short-hands J^^, J^., y^^,, 

}-^j'' }-ie' ^e' ^°^ Z^iel' ^jej' ^i'eX' ^j'ej' Z^eSf' Z^e'gS' 

respectively, with similar conventions for products. 



Lemma 5 Consider the NFG N{9) and let c €z Cs be a valid 
configuration of it. Then 



gi{ci) = y 6*^,^,(4), 



i GX, 






:\]),]- 



Proof: The first two expressions follow easily from the defini- 
tions of gi and gj in Definition 4. The third expression follows 
from 
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Definition 6 The (Gibbs) partition function of the NFG N {9) 

is defined to be the sum of the global function over all 
configurations, or, equivalently, the sum of the global function 
over all valid configurations, i.e., 






ceCe 



5(c). 



(2) 



In the following, when confusion can arise what NFG a certain 
Gibbs partition function is referring to, we will use Zg(N(0)), 
etc., instead of Zq.^ D 

Definition 7 The Gibbs free energy function associated with 
the NFG N{6) is defined to be 



Fg : He, ^ M, P^ Ug{p) - Hg{p), 



where 



C/g : Hc^ ^ K, P^ -J^P'^- l°g i9{c)), 
ceCs 

TJq : nc£ ^ K, p^ - ^Pc- log (pc)- 

ceCs 

Here, Uq is called the Gibbs average energy function and Hq 
is called the Gibbs entropy function. In the following, when 
confusion can arise what NFG a certain Gibbs free energy 
function is referring to, we will use i^G,N(e)> ^tc., instead of 
Fq. Similar comments apply to Uq and Hq. D 

For more details on these functions we refer to, e.g., [9]. 
For a discussion of these functions in the context of NFGs we 
refer to, e.g., [11]. Note that Hq is a concave function of p, 
that Uq is a linear function of p, and that, consequently, Fg 
is a convex function of p. 



'Note that "function" in "partition function" refers to tlie fact that the ex- 
pression in (2) typically is a function of some parameters like the temperature 
T (see the discussion below). A better word for "partition function" would 
possibly be "partition sum" or "state sum," which would more closely follow 
the German "Zustandssumme" whose first letter is used to denote the partition 
function. 



Lemma 8 The permanent of 9 can be expressed in terms of 
the Gibbs partition function or in terms of the minimum of the 
Gibbs free energy function of N(0). Namely, 



perm(0) = Zg = cxp ( - minFG(p) 1 , (3) 

where the minimization is over p G Hc^. 

Proof: The first equality is a straightforward consequence of 
Definitions 1 and 4, along with Lemma 5. For the second 
equality we refer to, e.g., [9], [11]. D 

The Gibbs partition function Zq and the Gibbs free energy 
function Fq were specified for temperature T = 1 in the above 
definitions. For a general temperature parameter T e M^o, 
these functions have to be replaced by Zq = ^ceCe yi^V 
and by Fq{p) = Uq{p) — T ■ Hq{p), respectively, and 
Lemma 8 has to be replaced by Zq = cxp (— y minp Fg (p)) . 
Of course, Zq = pcrm(0) does not hold anymore, unless a 
suitable T-dependence is built into the definition of pcrm(0). 

III. The Bethe Permanent 

Although the reformulation of the permanent in the above 
lemma in terms of a convex minimization problem is elegant, 
from a computational perspective it does not buy us much. 
However, it suggest to look for a minimization problem that 
can be solved efficiently and whose minimal value is related 
to the desired quantity. This is the approach that is taken in 
this section and will be based on the Bethe approximation of 
the Gibbs free energy function: the resulting approximation 
of the permanent of a non-negative square matrix will be 
called the Bethe permanent. (Note that in this section we give 
the technical details only; for a general discussion w.r.t. the 
motivations behind the Bethe approximation we refer to [9], 
and for a discussion of the Bethe approximation in the context 
of NFGs we refer to [11].) 

Definition 9 Consider the NFG N(0). We let 

be a collection of vectors based on the real vectors 

Pi = (Pi,ai)ai<£Ai i 
Pj = (Pj,aj jajG^j, 
/3e = il3e,aJa,eA,- 

Moreover, we define the sets 

B,^Ua,, iel, 

Be = UA,, ee£, 

and call Bi, Bj, and Be, the ith local marginal polytope, the 
jth local marginal polytope, the eth local marginal polytope, 
respectively. (Sometimes Bi is also called the ith belief poly- 
tope, etc.) 



With this, the local marginal polytope (or belief polytope) 
B is defined to be the set 

f3i G Bi for all i ^X 
Pj e Bj for all i ^ J 
fie G Be for all e ^ E 



B 



/3 



7 J Pi, a'. — Pe.ae 

a' ^Ai: a', —a^ 

for all e = {hj) ^ S, a^ ^ Ae 



> , 



a'.eAj-. a'.^=a^ 

for all e = {i,j) £ E, ae <E Ae . 

where f3 £ B is called a pseudo-marginal vector (The two 
constraints that were listed last in the definition of B will be 
called "edge consistency constraints." ) D 

Definition 10 The Bethe free energy function associated with 
the NFG N(0) is defined to be the function 



where 



with'' 



ub-.b^r, f3^Y. UB,rm + J2 u^APj) 

i 3 

Hb-.B^R, (3^J2 ^b,.(A) + J2 ^B.j(/3,) 

i 3 

e 

[/b,» ■■ B^-)■R, A 1^ - X! -^''a. ■ log (5»("0) > 

Ub,j : B, ^ R, /3, ^ - ^ /3,,„, • log {g, (a,)) , 

aj 

HB,^ : B, -^R, A ^ - 51 A.a. • log (/3.,o.) , 
Hb^j : B, ^ M, f3,^-J2 P3.a, ■ log (ft.a, ) , 

aj 
HB,e -.Be^R, f3e^ -J2 /^^^«e ' log (/^e^a J • 



Here, Ub is the Bethe average energy function and Hb is 
the Bethe entropy function. In the following, when confusion 
can arise what NFG a certain Bethe free energy function is 
referring to, we will use i^B,N(e). ^fc., instead of Fb. Similar 
comments apply to Ub and Hb. CH 

With this, the Bethe partition function of an NFG is defined 
such that an equality analogous to the second equality in (3) 
holds. 

Definition 11 The Bethe partition function of the NFG N{6) 
is defined to be 



Zb = cxp ( - niini^B(/3) 



^Here and in the following, we use the short-hand J^ ^ for X] a e .A ■ ^''-- ■ 



In the following, when confusion can arise what NFG a certain 
Bethe partition function is referring to, we will use Zb(N), 
etc., instead of Zb. D 

The next definition is the main definition of this paper and 
was motivated by the work of Chertkov et al. [7] and by the 
work of Huang and Jebara [8]. 

Definition 12 Consider the NFG N{9). The Bethe permanent 
of 9, which will be denoted by perni3(0), is defined to be 



pcrm^i9)^ZB{n{e)). 
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A similar comment w.rt. a temperature parameter T e M^o 
as at the end of Section II applies also to the definition of the 
Bethe partition function and the Bethe free energy function. 
In the following, however, we will only consider the case 
T = 1. An exception is Section VIII on the fractional Bethe 
approximation: this approximation can be viewed as introduc- 
ing multiple temperature parameters, namely one temperature 
parameter for every term of Hb, and therefore includes the 
single temperature parameter case as a special case. 

IV. Properties of the Bethe Entropy Function 
AND THE Bethe Free Energy Function 

There are relatively few general statements about the shape 
of the Bethe entropy function. In this section we show that 
Bethe entropy function associated with N{9) has many special 
properties. 

• In general, the Bethe entropy function is not a con- 
cave function. However, here we show that the Bethe 
entropy function under consideration, when suitably 
parametrized, is a concave function. 

Similarly, the Bethe free energy function is in general 
not a convex function. However, because the Bethe free 
energy function is the difference of the Bethe average en- 
ergy function and the Bethe entropy function, because the 
Bethe average energy function is linear in its arguments, 
and because the Bethe entropy function is concave, the 
Bethe free energy function under consideration is convex 
and does not have non-global local minima.^ 
« In general, the Bethe entropy function can take on 
positive, zero, and negative values. However, here we 
show that the Bethe entropy under consideration is non- 
negative. 

• Very often, the directional derivative of the Bethe entropy 
function away from a vertex of its domain is +cx3 or — oo. 
Here we show that the directional derivative of the Bethe 
entropy function under consideration has a (non-negative) 
finite slope away from any vertex of its domain. (As we 
will see in Section V, this observation will have important 
consequences for the SPA convergence analysis.) 

^The fact that convexity / non-convexity of a function depends on its 
parametrization might explain the non-convexity observations in [8, Sec- 
tion 3.3] w.r.t. the Bethe free energy function. 



A. Reformulation of the Bethe Entropy Function 
and the Bethe Free Energy Function 

As mentioned in Section I-C, the successes of the max- 
product algorithm / min-sum algorithm based approaches 
to the maximum weight perfect matching problem in the 
papers [19]-[21] was heavily based on a theorem by Birkhoff 
and von Neumann {cf Theorem 3). This theorem is equally 
central to the results of the present paper. Namely, in the next 
lemma we introduce a parametrization of the belief polytope 
B based on r„xn that will be used for the rest of the paper 

Lemma 13 Consider the NFG N(0). Its belief polytope B can 
be parametrized by r„xn, the set of doubly stochastic matrices 
of size n X n. In particular, we define the parametrization 
such that the matrix 7 = iji,j)iij)£ixj G ^nxn indexes the 
pseudo-marginal vector (3 Cz B with 



|3^,a, 



and 



/3e,a. 



ae=0 



= f3j 



^-li,j, PeM, 



7»,i, 



ae = l 



^7^,: 



for every i eT, j <E J, and e = (i, j) G E. 



Proof: It is straightforward to verify that the pseudo-marginal 
vector /3 which is specified in the lemma statement is indeed 
in B. Moreover, with the help of Theorem 3 one can verify that 
for every pseudo-marginal vector f3 E B there is a 7 G r„XTi 
such that 7 indexes /3. D 

In the following, for a given matrix 7 = {'li,j){i,j)exxj^ 
the i-th row of 7 will be denoted by 7.; = {'yi,j)j£j and the 
j-th column of 7 will be denoted by jj = {ji.j)iex- 

The above observations allow us to express the Bethe free 
energy function and related functions in terms of 7 S r„xra- 

Lemma 14 Consider the NFG N(0). Then 

FB:r„x„^M, 7^t/B(7)-ffB(7), 
where 

I, 7 '-^ XI ^B,^ (7« ) + XI ^B J (7j ) , 



Ub ■ r„xn 

Hb : Trixn 



with 



Hbs ■ n[i 

Hbj ■ !![, 



f^J2 HbAh) + X ^Bj(70 
* j 

j 

i 
3 

I3 ^ -^1^3 ■log(7«j), 



7jj ^-> -7*,ilog(7»j) - (l-7»j)log(l-7*,i), 



Proof: This follows straightforwardly from Definition 10 and 
Lemma 13. D 



Corollary 15 It holds that 



pernig(0) = cxp 



min Fb (7) 

7er„xn 



where 



FBh) = UB{l)-HBh), 



UBil) = -2^7»,ilog(6'»j), 



^,3 



1,3 



Proof: This follows from Definitions 11 and 12 and from 
Lemma 14. D 

If the sign in front of the second half of the expression 
for i?B(7) in Corollary 15 were a minus sign, then i?B(7) 
could be expressed as a sum of binary entropy functions, 
and therefore the concavity of Hb{j) would be immediate. 
However, the presence of the plus sign means that a more 
careful look at 7^3(7) is required to determine if it is concave 
or not. 

Assumption 16 For the rest of this section we assume that 
n ^ 2 and that 6 is a positive matrix of size n x n. This 
simplifies the wording of most results without hurting their 
generality too much. In practice, two possible ways to deal 
with the issue of zero entries in 6 are the following. 

• One can change the matrix 6 so that zero entries become 
tiny positive entries. 

• One can redefine N(0) by removing the edge e = (i,j), 
along with redefining the local functions gi and Qj, if 
e^.■, =0 D 



B. Concavity of the Bethe Entropy Function 
and Convexity of the Bethe Free Energy Function 

Towards showing that i?B(7) is a concave function of 7, 
and subsequently that -^6(7) is a convex function of 7, we 
first study two useful functions. Namely, in Definition 17 and 
Lemma 18 we look at a function called s, and in Definition 19 
and Theorem 20 we look at a function called S. Note that in 
this section we use the short-hands ^^ and X]f=t^* fo'" Scefnl 
and E^e[„]:£5^r' respectively. 

Definition 17 Let s be the function 

s: [0,1]^M, e^-eiog(0 + (l-01og(l-0- 

Note that in contrast to the binary entropy function, there is 
a plus sign (not a minus sign) in front of the second term. D 

Lemma 18 The function s that is specified in Definition 17 
has the following properties. 

• As can be seen from Figure 2 (left), the graph of the 
function s is s-shaped. 



The first-order derivative of s is 

^5(0 = -2 -log (^(1-0). 
The second-order derivative of s is 

— (F\ = -1 + -^ = -1—^ 

Clearly, the function s(^) is strictly concave in the 
interval ^ ^ < 1/2 and strictly convex in the interval 
1/2 < e s^ 1. 
The graph of s has a point-symmetry at (1/2, 0). 



Proof: The proof of this lemma is based on straightforward 
calculus and is therefore omitted. D 



Definition 19 Let S be the function 

s : n[„] -^R,i^Y. <^^) = - E ^^ iog(^^) 

+ ^(l-^,)log(l-C£)- 

i 

u 

Figure 2 (right) shows the function S{^) for n ~ 3. More 
precisely, that plot shows the contour plot of the function 

^(^1,6,1-6-6)- 

Clearly, if the domain of the function 5 were the set [0, 1]", 
then S would not be concave everywhere because s is not 
concave everywhere. Therefore, the observation that is made 
in the following theorem, namely that S is concave, is non- 
trivial.** 

Theorem 20 The function S from Definition 19 is concave 
and satisfies S{£,) ^ for all ^ G !![„]. Moreover, 

• For n = 2, it holds that S{^) = for all ^ £ Hui. 

• For n ^ 3, the function S is at almost all points in 
its domain a strictly concave function. However there 
are points in its domain and corresponding directions 
in which the function S is linear 




Proof: See Appendix A. 
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Lemma 21 The Bethe entropy function can be expressed in 
terms of the function S as follows 



Hb ■■ Tn 



IH' 



lT.sh^) + lT.sh,y 



^Because the function s is concave in [0, 1/2], tlie function S is concave 
in Ilr^i n [0, 1/2]". Therefore, as we will see, most of the work in the proof 
of the upcoming theorem will be devoted to proving the concavity of the 
function S in n[„] \ [0, 1/2]". 



Fig. 2. Left: plot of the function s, cf. Definition 17. Right: contour plot of 
the function 5(^1, {2, 1-^1-52), cf. Definition 19. 



Proof: This result follows from 

Hb{i) 

i \ J 3 J 

j \ i i / 

where at step (a) we have used Corollary 15 and where at 
step (b) we have used Definition 19. D 

Theorem 22 The Bethe entropy function Hb{'j) is a concave 
function 0/7 G Tnxn- Moreover, for all 7 G r„xri it holds 
that Hb{j) ^ 0. 

Proof: Lemma 21 showed that HBij) can be written as a sum 
of S'-functions. The concavity of if 3(7) then follows from 
Theorem 20 and the fact that the sum of concave functions 
is a concave function. Similarly, the non-negativity of Hb (7) 
follows from Theorem 20 and the fact that the sum of non- 
negative functions is a non-negative function. D 

Corollary 23 The Bethe free energy function Fb{'j) is a 
convex function of ■^ £ r„xri- 

Proof: This follows from ^3(7) = Ub{'^) — Hb{i) {cf Corol- 
lary 15), from the fact that ^(7) is a linear function of 7 
{cf Corollary 15), and from the fact that HBij) is a concave 
function of 7 {cf Theorem 22). D 

C. Behavior of the Bethe Entropy Function and the Bethe Free 
Energy Function at a Vertex of their Domain 

In this section we study the Bethe entropy function and 
the Bethe free energy function near a vertex of their domain. 
Because both functions can be expressed in terms of the 
function S, we first study the behavior of S near a vertex 
of its domain. 



Lemma 24 Let 



m = ^+t-t 



where the vector ^ £ nr„i is a vertex of nr„i and where ^ 
is such that ^{t) € nr„i for small non-negative t. This means 
that there is an £* S [n] such that ^ satisfies ^i* ~ 1 and 
S.e ~ 0, £ ^ £*, and such that ^ satisfies ^i- < 0, ^i ^ 0, 
£ 7^ £*, and ^^ ^i = 0. Then, for < i ^ 1, we have 



s{m) = t ■ \ii 







(4) 



i.e., the function ^(^(i)) can very well be approximated by a 
linear function for < t ^ 1. Note that the coefficient of t 
in (4) is non-negative. 



Proof: See Appendix B. 
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A word of caution: the behavior of the function S is 
somewhat special around a vertex ^ of n[„] : namely, /« general 
there is no gradient vector G such that S{^ + ^ • = 
S{^)+t-Y.i Gii+0{t^) = t-j:, Gde+0{t^) for < t « 1 
and for all possible direction vectors ^. 

Lemma 24 has the following consequences for the behavior 
of the Bethe entropy function at a vertex of its domain. 



Lemma 25 Let 



lit) =l + t-l, 



where 7 G Cf is a vertex o/r„xn <^nd where 7 is such that 
lit) S r„xn for small non-negative t. This means that 7 
corresponds to the permutation a^. {In the following statement 
we will use the short-hands a = a~^ and a = u^^.} Then, for 
< t ^ 1, we have 



i/o-(*) 



^E 



7<tO)jI 




0{f) 



0{t% 



i.e., the function iJB('7(0) ^^^ ^^T ^^^l be approximated by 
a linear function for < t <C 1. Note that the coefficient oft 
is non-negative. 



Proof: See Appendix C. 
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Assume that 7 in Lemma 25 is chosen such that 
J2i \ji.cr{i) I = 1- (If this is not the case, then 7 can be rescaled 
by a positive real number such that this condition is satisfied.) 
The coefficient of t in the first display equation of Lemma 25 
can be given the following meaning. It is the entropy rate of 
the time-invariant Markov chain corresponding to the (back- 
trackless) random walk on the NFG N(0) (cf Figure 1) with 
the following properties;^ 

'For a discussion of the entropy rate of a time-invariant Markov chain, see, 
e.g., [31, Section 4.2]. 



• The probability of being at vertex i E I is |7i.CT(i)|. 

• The probability of going to vertex j E J' \ {(j{i)}, 
conditioned on being at vertex i Gl, is \ji,j\/\7i,a{i)\- 
The probability of going to vertex a{i) e J', conditioned 
on being at vertex i G I, is 0. 

« The probability of being at vertex j £ J' is \la(j).j\- 
m The probability of going to vertex a{j) G I, conditioned 
on being at vertex j G J', is 1. 

The probability of going to vertex i' E I \ {a-{j)}, 
conditioned on being at vertex j G J^, is 0. 

The above two half-steps of the random walk can be combined 
into one step. 

« The probability of being at vertex i G X is |7i,CT(i)l- 

« For i,i' G I with t 7^ i', the probability of going to 

vertex cr{i') and then to vertex i', conditioned on being 

at vertex i, is \ji,a{t'}\/\li.a{t)\- 

An analogous interpretation can be given to the coefficient 

of t in the second display equation of Lemma 25. Observe that 

the condition ^ • \ji,cr{i) | = 1 is equivalent to the condition 

Note that similar random walks appeared in the analysis of 
the Bethe entropy function for so-called cycle codes [cf [32]) 
and in the analysis of Unear programming decoding of low- 
density parity-check codes (cf [33], which gives a ran- 
dom walk interpretation of a result by Arora, Daskalakis, 
Steurer [34] and its extensions by Halabi and Even [35]). 
Actually, given the fact that the symmetric difference of two 
perfect matchings corresponds to a union of cycles in N(0), 
the similarity of the random walks here and of the random 
walks in the above-mentioned context of cycle codes is not 
totally surprising. 

We come now to the main result of this subsection. Al- 
though this result is interesting in its own right, it will be 
especially important for the convergence analysis of the SPA 
in Section V. 



Theorem 26 Let 



lit) ^-f + t-^, 



where 7 G Cf is a vertex ofTnxn cind where 7 is such that 
7(t) G r„xn for small non-negative t. This means that 7 
corresponds to the permutation a-f. (In the following statement 
we will use the short-hands a = a-^ and a = cr^^.) We also 
assume that 7 is normalized as follows 



i j 



(5) 



Then, for < t ^ 1, we have 

FBilit)) > -^log(0,,,(,))-t-log(p) + O(f2), (6) 

i 

where p is the maximal (real) eigenvalue of the n x n matrix 
A with entries 



I (otherwise) 
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Note that equality holds in (6) for the matrix 7 with entries 

A I +K • '- - 



dfi^t') 

(otherwise) 



where u^ and u^ are, respectively, the left and right eigen- 
vectors of A with eigenvalue p, and where k is a suitable 
normalization constant such that (5) is satisfied. 



Proof: See Appendix D. 
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Corollary 27 Consider a vertex 7 o/r„XTi and define p for 
7 as in Theorem 26. 

• If p < 1 then Fb has its unique minimum at 7. 
» If p > I then Fb is not minimal at 7. 

Proof: From Theorem 26 we know that 

^B (7(0) ^ - E l0g(^^,.{.)) - t ■ log(p) + 0{t^), 

i 

with equality for the direction matrix 7 that was specified 
there. Moreover, from Corollary 23 we know that Fb is convex 
over r„xn- Therefore, if log(p) < (i.e., p < I) then Fb has 
a unique minimum at 7. On the other hand, if log(p) > 
(i.e., p > I) then Fb cannot be minimal at 7. 

Note that for \og{p) = (i.e., p ~ 1), the minimality / 
non-minimality of Fb at 7 is determined by the 0{t^) term. 

D 

Typically, the Bethe entropy function and the Bethe free 
energy function have positive or negative infinite slope at a 
vertex of their domain because of the appearance of terms 
like c-t- log(i). However, because for the function S all these 
c ■ t ■ log(i) terms cancel in the vicinity of a vertex of its 
domain (see the proof of Theorem 20, in particular Eq. (17) 
in Appendix A-B), the slopes of the Bethe entropy function 
and the Bethe free energy function are finite at a vertex of 
their domain. 

Let us conclude this section by pointing out that the obser- 
vations that were made in this subsection give an alternative 
viewpoint of some of the results that were presented in [14, 
Section 3]. 

V. Sum-Product- Algorithm-Based Search of the 
Minimum of the Bethe Free Energy Function 

Assumption 28 In this section we make the following two 
assumptions, both with the goal of simplifying the wording of 
most results without hurting their generality too much. 

• We assume that n ^ 2 and that 9 is a positive matrix 
of size n X n. In that respect, see also the comments in 
Assumption 16. 

• We assume that the minimum of the Bethe free energy 
function Fb is either in the interior of r„xn or at a 
vertex ofTnxn, but not at a non-vertex boundary point 
ofTnxn- A possibility to guarantee this with probability 1 
is to apply tiny random perturbations to the entries of 6. 

D 

In Definition 12 we have defined the Bethe permanent of 
a square matrix 6 via the minimum of the Bethe free energy 



function of the NFG N(0). In Corollary 23 we have seen that 
the Bethe free energy function is a convex function, i.e., it 
behaves very favorably. This means that we could use any 
generic optimization algorithm (see, e.g., [26], [36]) to find 
the minimum of the Bethe free energy function, and with that 
the Bethe permanent of 9. However, given the special structure 
of the optimization problem, there is the hope that there are 
more efficient approaches. 

A natural candidate for searching this minimum is the 
SPA [28]-[30]. The reason for this is that a theorem by 
Yedidia, Freeman, and Weiss [9] says that fixed points of 
the SPA correspond to stationary points of the Bethe free 
energy function.'" Given the convexity of the Bethe free 
energy function, the following two questions must therefore 
be answered: 

« If the minimum of Fb is in the interior of r„xn, does 
the SPA always converge to a fixed point? 

• If the minimum of Fb is at a vertex of Tnxn, does the 
SPA find that vertex? 

In this section we answer both questions affirmatively, inde- 
pendently of the matrix 9, and (nearly) independently of the 
chosen initial messages. 

The rest of this section is structured as follows. First 
we discuss the details of the SPA message update rules in 
Section V-A. Afterwards, we state the SPA convergence result 
in Section V-B. 



A. Sum-Product Algorithm Message Update Rules 

In this subsection we derive the SPA message update rules 
for the NFG N{9) in Figure 1. Here we only give the technical 
details; for a general discussion w.rt. motivations behind the 
SPA we refer to [28]-[30]. Note that in contrast to [8] we use 
an undampened version of the SPA. 

On a high level, the SPA works as follows. With every 
edge in Figure 1 we associate a right-going message and a 
left-going message. Every iteration of the SPA consists then 
of two half-iterations, in the first half-iteration the right-going 
messages are updated based on the left-going messages and in 
the second half-iteration the left-going messages are updated 
based on the right-going messages. Finally, once some suitable 
convergence criterion is met or a fixed number of iterations 
has been reached, the pseudo-marginal vector (belief vector) 
is computed based on the messages at the last iteration. 

Mathematically, we define for every t ^ and every edge 

R, and 



{i,j)GTxJ'sL left-going message ^Ij : A, 



for every t ^ 1 and every edge {i,j) E X x J' a right-going 
message ~flij : Ai.j -^ K. 

For every left-going and for every right-going message it 

'"Strictly speaking, for NFGs with hai'd constraints, i.e., NFGs that contain 
local functions that can assume the value zero for certain points in their 
domain (which is the case for N(0)), this statement has only been proven for 
interior stationary points of the Bethe free energy {cf. [9, Theorem 2]). For 
SPA fixed points with some beliefs equal to zero it is only conjectured that 
they correspond to edge-stationary points of the Bethe free energy function 
{cf. discussion in [9, Section VI. D]). 
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turns out to be sufficient to keep track of the likelihood ratios 



t 



w.%^!ho) ^WAU!ho) 



T^^d)' 



A 



'"~W(iy 



respectively. Actually, for the NFG under consideration it is 
more convenient to deal with the inverses of these quantities, 
and so we define the inverse likelihood ratios as follows 






\^*?^( A 



'J 



hi 



Lemma 29 Consider the NFG N(0). The inverse likelihood 
ratio update rules for the left-hand side and right-hand side 
function nodes of N{6) are given by, respectively, 



t 



(*) 



V.(*' = 




t^ 1, {i,j)elxj, 



f > 1, {i,j) elx J. 



The pseudo-marginal vector at the left-hand side and right- 
hand side function nodes of N (ff) are given by, respectively. 



/3: 



(t) 



(t) 










Here the proportionality constants are defined such that for 
every function node the beliefs sum to 1. 

Proof: See Appendix E. D 

Let us remark on the side that the above update equations 
can be reformulated such that we only multiply by factors like 
9i^j instead of by factors like \/Oi~j- We leave the details to 
the reader. 

Remark 30 The SPA messages for the NFG N(0) exhibit the 

following property, a property that we will henceforth call 

"message gauge invariance." Namely, consider the messages 



{^5} 



and 



i,i,t 



t 



i,j,t 



that are connected by the update equations in Lemma 29. It 
is then easy to show that for any C G M>o the messages 






and { — ■ Vij 



i,j,t 



also satisfy the update equations in Lemma 29. Moreover, the 
pseudo-marginals {/?,■„.} and {(3, {aj)\ . are left 

unchanged by this rescaling of the inverse likelihood ratios. 
This is because the normalization that appears in the definition 
of { By„ \ . ^ and {B\ (a,)| . ^ removes the influence of 
this message rescaling. D 

Strictly speaking, the Bethe free energy function can only 
be evaluated at fixed points of the SPA. However, very often 
it is desirable to track the progress towards the minimal Bethe 
free energy function value. This can be done via the so-called 



pseudo-dual function of the Bethe free energy function [37], 
[38]. This function has the following two properties: it can 
be evaluated at any point during the SPA computations, and 
at a fixed point of the SPA its value equals the value of the 
Bethe free energy function. However, in general it is not a 
non-increasing or a non-decreasing function of the iteration 
number 

Lemma 31 Consider the NFG N(0). For any set of left- 
going messages { Vi ,j } . . and any set of right-going messages 

{ Vjjj. ., the pseudo-dual function of the Bethe free energy 
function is 

-Ei°g(Ev^-^^^^) 

+ ^\og(l + %■%,) 

Proof: See Appendix F. D 

In particular, if desired, we can evaluate iecthc ^^' 
ter every half-iteration of the SPA, i.e., we can compute 

^the({Vir'Y{^'}) and^,,,({V,(^},{^^S}) for 
every t ^ 1. 

B. Convergence of the Sum-Product Algorithm 

Note that there are rather few general results concerning the 
behavior of message-passing type algorithms for NFGs with 
cycles. For certain classes of graphical models and message- 
passing type algorithms, early results showed that under the 
assumption that the algorithm converges then the obtained 
estimates are correct (see, e.g., the results in [39], [40]). Later, 
conditions for convergence were established for a variety of 
graphical models and message-passing type algorithms (see, 
e.g., [41]-[44] and references therein). However, these results 
do not seem to be applicable to the NFG under consideration 
in this paper 

The SPA convergence proof that is the most relevant for the 
present paper is the one in the paper by Bayati and Nair [18] 
(see also the comments that we made about this paper in 
Section I-C). However, the fact that the graphical model in [18] 
counts matchings (and not only perfect matchings like here), 
implies a different behavior of the Bethe free energy function 
near the boundary of its domain, and so no separate analysis 
of interior and boundary minima of the Bethe free energy is 
required in the convergence proof in [18]. 

Note that, interestingly enough, establishing convergence for 
the SPA on N(0) is independent of the choice of 6, which 
is in contrast to, say, Gaussian graphical models where the 
convergence behavior not only depends on the connectivity 
of the underlying graph but also on the values of the non- 
zero entries of the information matrix describing the Gaussian 
graphical model. (Of course, the convergence speed of the SPA 
on N(0) does depend on the choice of 9.) 
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Theorem 32 Consider the SPAforNFG N{e), for which the 
message update rules were established in Lemma 29. For 
any initial set of inverse likelihood ratios | Vj ■ } . . that 

satisfies < V^ < oo, {i,j) £ I x J', the pseudo- 
marginals computed by the SPA converge to the pseudo- 
marginals that minimize the Bethe free energy function of 
N(0). More precisely, we can make the following statements}^ 
• If the minimum of Fb is in the interior o/r„xn, then the 
inverse likelihood ratios 



{^^^U.., ««^{^^^}.. 



j.t 



stay bounded and converge (modulo the message gauge 
invariance mentioned in Remark 30) to the fixed point 
inverse likelihood ratios corresponding to the minimum 

ofFB. 

If the minimum of Fb is at at the vertex 'y ofTnxn, then 
the inverse likelihood ratios satisfy 



V'*' 

« J 

V'*^ 



j=a--,{i) 



j^a-,(i) 



-^ CXJ, 



-^ 0, 









-> OO, 



^ 0. 



Finally, 



exp -K 



?* 

Bcthc 



{^?}>{d))-P-mBW 



for some constants C^v £ 
and the initial messages. 



€C-e- 



>o that depend on the matrix 6 



Proof: See Appendix G. D 

Explicit convergence speed estimates (in particular, values 
for C and v) can be extracted from the proof of Theorem 32. 
However, we think that a more sophisticated analysis might 
yield tighter convergence speed estimates; we leave this as an 
open problem for future research. 

VI. Finite-Graph-Cover Interpretation 
OF THE Bethe Permanent 

Note that the definition of the permanent of 9 in Definition 1 
has a "combinatorial flavor." In particular, it can be seen as 
a sum over all weighted perfect matchings of a complete 
bipartite graph. This is in contrast to the definition of the 
Bethe permanent of [cf Definitions 11 and 12) that has an 
"analytical flavor" In this section we show that it is possible 
to represent the Bethe permanent by an expression that has 
a "combinatorial flavor." We do this by applying the results 
from [11], that hold for general NFGs, to the NFG N(0). The 
key concept in that respect are so-called finite graph covers. 
(We keep the discussion here somewhat brief and we refer 
to [11] for all the details. See also [45].) 

Definition 33 (see, e.g., [46], [47]) A cover of a graph G 
with vertex set V and edge set E is a graph G with vertex set 
V and edge set £, along with a surjection tt : V — >■ V which 
is a graph homomorphism (i.e., tt takes adjacent vertices of 

"We remind the reader of the assumptions that were made in Assump- 
tion 28. 
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Fig. 3. (a) NFG N(6») for n 
possible 4-cover of N(0). 



3. (b) "Trivial" 4-cover of N(0) (c) A 



G to adjacent vertices of Gj such that for each vertex v £ V 
and each v g 7r~^(u), the neighborhood d{v) ofv is mapped 
bijectively to d{v). A cover is called an M-cover, where 
M S Z>o, if |7r~^(u)| = M for every vertex v in V.^^ D 

Because NFGs are graphs, it is straightforward to extend 
this definition to NFGs. (Of course, the variables that are 
associated with the M copies of an edge are allowed to take on 
different values.) For an A/-cover, the left-hand side function 
nodes will be labeled by elements of I x [M], the right-hand 
side function nodes will be labeled by elements of J' x [A/], 
and the edges will be labeled by elements of a cover-dependent 
subset of I X [M] x J x [M]. 

Example 34 Let n = 3. The NFG N(0) is shown in Fig- 
ure 3(a). There is only one l-coverofH(6), namely N(0) itself. 
Two possible A-covers of N(0) are shown in Figures 3(b)-(c). 
The A-cover in Figure 3(b) is "trivial" in the sense that it 
consists of A disconnected copies of N{9). On the other hand, 
the A-cover in Figure 3(c) is "nontrivial" in the sense that it 
consists of A copies of N{6) that are intertwined. D 

Lemma 35 Let Mm (6) be the set of all M -covers N o/N(0). 
It holds that 



|AAm(0)| = (!•/!)("'). 



(7) 



Proof: This follows from [11, Lemma 15] and the fact that 
the NFG N(0) has n^ full edges. D 

The following definition is the main definition of this 
section. 

Definition 36 For any M G Z>o we define the degree-M 
Bethe permanent of to be 



VCrm^.MW^ XKZgW)^^^^^^ 



'^The number M is also known as the degree of the cover. (Not to be 
confused with the degree of a vertex.) 



where the angular brackets represent the arithmetic average 
o/Zg(N) over all N G Mm- (Note that the right-hand side is 
based on the Gibbs partition function, not the Bethe partition 
function.) D 

As we will now show, one can express Zq,{H) for any M- 
cover N of N(0) as the permanent of some matrix that is 
derived from 6. 

Definition 37 For any M G Z>o we define 'i']\i to be the set 



* 



M 



= {P={P''^''}.eT,ej\P'''''^'P^^x^^}- 



Moreover, for P G ^7\/ we define the P-lifting of to be the 
following (nM) x (nM) matrix 



jtP A 



5i^_-^p(i.i) ... 0^,^p(h^)- 



,1-* ^n.n-'^ / 



u 



For any positive integer AI it is straightforward to see that 
there is a bijection between the set Mm{0) of all A/-covers 
of N(0) and the set {d^^^piz^ ■ In particular, because of 
Lemma 8, for an A/-cover N and its^ corresponding matrix 
d^^ it holds that Zq[H) = perm(0^^). Therefore, we have 
the following reformulation of Definition 36. 

Definition 38 (Reformuiation of Definition 36) For any 

M G Z>o we define the degree-M Bethe permanent of to 
be 



permg m{^) — y ( pcrm ( 9"^^ 






-Pe*A 



(8) 



where the angular brackets represent the arithmetic average 
o/pcrm(0^-'^j over all P G ^a/- (Note that the permanent, 
not the Bethe permanent, appears on the right-hand side of 
the above expression.) D 

In order to better appreciate the right-hand side of the 
above expression, it is worthwhile to make the following two 
observations. 

• For M = 1, the averaging is trivial because 'I'a/ contains 
only one element. Moreover, letting P be this single 
element, it holds that 6^^ = 9. Therefore 

pcrnig j^(0) — perm(0). 

• For any M G Z>o, the "trivial" M-cover of N(0) is given 
by the choice P = |-P(''^H^.^ -^^ with p(*'J) = I, 

{i,j) ^ I X J, where I is the identity matrix of size 
M X M . For this A/-cover we obtain 

pcrm(6>'''-^) = pcrm(6')*^, 



perni(0'''-'') = pcrin(0) 



pcrmB j^j{9) = pcrmg (9) 

M-i-oo 



permB,M(^) 



permB,Af(^) 



A/=l 



perni(0) 



Fig. 4. The degree-A/ Bethe permanent of the non-negative matrix for 
different values of M . 



With this, we are ready for the main result of this section. 



Tlieorem 39 It holds that 

limsup pcrmg i^[{9) ~ pcrmg(0). 

A/->-oo 

Proof: This follows from Definitions 12 and 38, along with 
the appUcation of [11, Theorem 19] to N = N{9). D 

Theorem 39, together with the relation pcruiQ i{9) = 
perm(0), are visualized in Figure 4. Because the permanents 
that appear on the right-hand side of (8) are combinatorial 
objects. Definition 38 and Theorem 39 give the promised 
"combinatorial characterization" of the Bethe permanent. 

A. The Bethe Permanent for Matrices of Size 2x2 

In this and the following subsections we illustrate the 
concepts and results that have been presented so far in this 
section by having a detailed look at the case n = 2, i.e., we 
study the permanent, the Bethe permanent, and the degree-M 
Bethe permanent for the matrix 



9 



'1,1 

hi 



71,2 
92,2 



The corresponding NFG N(0) is shown in Figure 5(a). Of 
course, nobody would use the Bethe permanent to approximate 
the permanent of a 2 x 2 matrix, however, it gives some good 
insights on the strengths and the weaknesses of the Bethe 
approximation to the permanent. 

Lemma 40 For n — 2 it holds that 

perm(0) = 6'i,i6'2,2 + O2s0i,2, 
peTm^{9) = max(0i,i02,2, ^2,1^1,2)- 

Proof: The result for pcrm(0) follows from Definition 1. 
On the other hand, in order to obtain pcmi^{9), we apply 
Corollary 15. The crucial step in Corollary 15 is to minimize 
Fb(7) over 7 G r2x2- Because Hb{j) = 0, 7 G r2x2, 
minimizing -^^(7) is equivalent to minimizing C/b(7) = 

-E^,i7^Jl0g(6'^J). 

• For 9i 162.2 = ^1.2^2,1 the minimum is achieved at every 

■y & r2x2- 

> For di 1^2 2 > ^1 2^2 1 the minimum is achieved at 7 = 
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Fig. 5. Graphs (NFGs) that are discussed in Sections VI-B-VI-D. (a) Base graph, (b)-(c) Perfect matchings of the graph in (a), (d) A possible double cover 
of the graph in (a), {e)-(h) Perfect matchings of the graph in (d). (i) A possible double cover of the graph in (a), (j)-(k) Perfect matchings of the graph in (i). 



For ()\\()2 2 < (^\ 2^2 1 the minimum is achieved at 7 = 

{ID- 

u 



Example 41 For n — 2 and 9ij = 1, {i,j) £lx J, we have 

perm(0) = 2, 
permB(0) = 1. 

Recall that perm(0) represents the sum of all the weighted 
perfect matchings of the complete bipartite graph N(0), and 
so, for the special choice 6i_j ~ 1, (i,j) (z T x J', the 
quantity pcrm(0) represents the number of perfect matchings 
of N(0). As is illustrated in Figures 5(b)-(c), the graph N(0) 
has two perfect matchings, thereby combinatorially verifying 
perm(0) = 2. D 



where / is the identity matrix of size M x Af . Therefore, we 
can rewrite permg m{^) ^^ follows 



permB.^^(6)) 



\ 



(9) 



i.e., an average over the Ml permutation matrices of size M x 

M. 



C. The Degree-M Bethe Permanent for Matrices of Size 2x2 
— All-One Matrix 

In this subsection we consider the cases M ~ 2, M ~ 3, 
and general M for the special choice 



e 



1 1 
1 1 



B. The Degree-AI Bethe Permanent for Matrices of Size 2x2 
— Initial Considerations 

One of the goals of this and the next subsections is to obtain 
a better combinatorial understanding of the result permg {6) = 
1 for n = 2, in particular, why it is different from pcrm(0), 
yet not too different. 

Towards this goal, let us study the degree-A/ Bethe perma- 
nent of 6 as specified in Definition 38. Therein, the average 
is taken over ^a/ = (A/!)"' matrices 



6» 



tP 



^l,l-Pl,l ^l,2-Pl,2 
, ^2,1-P2,1 ^2,2-P2,2, 



0t-Pe* 



M- 



We can simplify the analysis by realizing that the permanent 
of 0^^ equals the permanent of a modified matrix 6^^ , where 
the first block row is multiplied from the left by Pi I, where 
the second block row is multiplied from the left by ^2"^ , and 
whe£e the^econd block column is multiplied from the right 
by Pil ■ Pi,i, i.e.. 



perm [6^ ) = perm 




h.2P2.lP2.2Pi}PlS, 



Example 42 Let n = 2, M = 2, and 9ij = 1, (i, j) G I x J. 
We make the following observations. 

• The average in (9) is over 2! = 2 matrices, namely over 



0t(l) A 



/ 1 
1 


1 \ 
1 


1 
\0 1 


1 
1 J 



0t(2) A 



/ 1 
1 


1 \ 
1 


1 

\o 1 


1 

1 0/ 



The matrix 0^'^' corresponds to the double cover ofN{9) 
shown in Figure 5(d). Because that graph has 4 perfect 
matchings, cf. Figures 5(e)-(h), we have 



perm 



(0t(i)) ^ 4. 



• The matrix 0^'^' corresponds to the double cover ofN{9) 
shown in Figure 5(i). Because that graph has 2 perfect 
matchings, cf. Figures 5(j)-(k), we have 

perm(6»1'(i)) = 2. 

Putting everything together, we obtain the degree-2 Bethe 
permanent of 6, i.e.. 



permj3^2(^) 



;/i-(4 



2)=f/^^6=*^s»1.732. 



15 









(b) 8 pms. 



(c) 4 pms. 



(d) 4 pms. 



(e) 4 pms. 



(f) 2 pms. 



Fig. 6. Graphs (NFGs) that are discussed in Sections VI-B-VI-D. (a) Base graph, (b)-(g) Possible triple covers of the graph in (a). ( 
"perfect matchings".) 



(g) 2 pms. 
'pms." stands for 





(a) (b) 

Fig. 7. The four perfect matchings of the triple cover in Figure 6(c). 




(c) 




We note that the graph in Figure 5(d) consists of M inde- 
pendent copies of the graph in Figure 5(a), therefore it is not 
surprising that perin(0^*-^-') = perni(0)*^ = 2^ = 4. On the 
other hand, the graph in Figure 5(d) consists of M coupled 
copies of the graph in Figure 5(a), which implies that we 
cannot choose the perfect matchings independently. Therefore, 
it is not surprising that we have perni(0^(^'') ^ perni(0)*^ = 
2^ = 4, which finally results in pernigjl^) t^ perni(0). 
Nevertheless, these considerations also show why pcrnig 2 (^) 
is not too different from pcrm(0). D 

Example 43 Let n = 2, M ^ 3, and 6,^^ = 1, {i,j) elxj. 
The average in (9) is over 3! = 6 matrices. These matrices 
correspond to the triple covers ofH{6) shown in Figure 6(b)- 
(g). Computing the number of perfect matchings for each of 
these cases, we obtain 



perniB^3(0) = 



3! 


•(8 + 4 + 


/I 
3! 


•24= s/i 



2) 



1.587. 



In particular, for the triple cover in Figure 6(c) we show its 
4 perfect matchings explicitly in Figure 7. 

Overall, we can make similar observations as at the end 
of Example 42 concerning the coupling of the M copies of 
N(0) that make up a degree-M cover and its influence on the 
number of perfect matchings. D 

Example 44 Let n ^ 2, M e Z>o, and 9i_j = 1, {i,j) e 
X'kJ. The average in (9) is over Ml matrices that correspond 
to the M -covers of N(0). For each of these matrices, their 
permanent equals the number of perfect matchings in the 
corresponding M -cover We make the following observations 
(see Figures 5-7 for illustrations for the cases M = 2 and 
M = 3j. 

• Every M -cover consists of up to M cycles. 

• Every cycle supports two perfect matchings (indepen- 
dently of the cycle length and independently of the perfect 
matchings chosen on the rest of the graph). 



Therefore, if an M-cover has c cycles then it has 2"^ perfect 
matchings. The average in (9) can then be evaluated with 
suitable combinatorial tools, for example by using the so- 
called cycle index of the symmetric group over M elements 
(see, e.g., [48]), and we obtain 



peTiRQj^j{e) = VM+T, 
Therefore, in the limit M — > 00, we obtain 

pcrnig(0) = limsup pcrnig m{^) ~ !• 

This confirms the result for pering(0) in Example 41, which 
was obtained by analytical means. D 

D. The Degree-M Bethe Permanent for Matrices of Size 2x2 
— General Non-Negative Matrix 

In this subsection we consider the cases M ~ 2, M — 3, 
and general M for the general non-negative matrix 



e 



n.i 
h.i 



71,2 
^2,2 



A particular goal of this subsection is to compare the degree- 
M Bethe permanent of with the permanent of 0. In fact, as 
we will see, for every considered case in this subsection we 
have permg m{^) ^ pcrm(0). 

Example 45 Let n = 2 and M ~ 2. We perform similar 
computations as in Example 42, but for a general non-negative 
matrix 0. Towards computing pernig 2(^) <^^ given in (9), we 
make the following observations. 

• The average in (9) is over 2! = 2 matrices, namely over 



Q-\(.l) A 



0t(2) A 



/ ^1,1 






^1,1 


6*1,2 




\ 

^1,2 


^2.1 

V 




^2,1 


^2,2 






^2,2 / 


/ ^1,1 




^1.1 


^1,2 




\ 

^1,2 


^2.1 
V 




^2,1 




^2,2 


^2,2 

/ 
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^ '^ ' 201_l6'i^26'2, 1^2,2 + 61^ 2^2 1- 



• We obtain 

perm (0'^(i)) = (6ii46l2,2 + 6'i,26l2,i)^ 

/)2 /i2 

— t^iat'2,2 

A^ofe f/iflf f/ie coefficients add up to 4 because 0^^^' corre- 
sponds to the double cover of N {9) shown in Figure 5(d), 
which admits 4 (weighted) perfect matchings. 

• We obtain 

perm {e^^'^)^ 91,01, + el^0l,. 

Note that the coefficients add up to 2 because 9"^^^' corre- 
sponds to the double cover ofN{9) shown in Figure 5(i), 
which admits 2 (weighted) perfect matchings. 

Putting everything together, we obtain for the square of the 

degree-2 Bethe partition function of 9 

(pcrmB,2(6'))' = i • (perm(0^(i)) +perm(6>^(2))) 

= ^1,1^2,2 + ^1,1^1, 2^2, 16*2, 2 + ^1,2^24- 
Given the observations that 

perm(6>'^'i') sC (perm(6/))^ 
pcrni(0''"'^^') ^ (pGrni(0)) , 
;'/ is not surprising that we also have the inequality 
(pcrmB2(^)) =^ (perm(0)) , 



I.e., 



permg 2(^) ^ perni(0) 



D 



Example 46 Let n ~ 2 and M = 3. We perform similar 
computations as in Example 43, but for a general non-negative 
matrix 9. Towards computing pcrniB ^{9) as given in (9), we 
make the following observations. 

• The average in (9) is over 3! = 6 matrices. These 
matrices correspond to the triple covers of N(0) shown 
in Figure 6(b)-(g). 

• For example, for the matrix 0^"' corresponding to the 
triple cover in Figure 6(c), we obtain 

perm (0"^^' 



q3 /i3 j^ /gl /i2 /)2 nl 
^l,l''2,2 + '^l,l''l,2''2,l''2,2 
I /)2 /)1 nl /)2 I nS /)3 
T" ''l,l''l,2^2.l''2,2 + ''l,2''2,l' 



where each (weighted) perfect matching in Figure 7 
contributes one monomial to the above expression. One 
can verify that 



perm (©^'^^ 



< 



,^1,1^2,2 + ^1,2^2,1) •(^1,1^2,2 + ^1,2^2,1) 

6*1, 16*2, 2 + ^1, 2^2,1) • (6*1, 16*2, 2 + 6*1, 26*2,1) 
^1,1^2,2 + ^1,2^2,1) 

perm(0)) . 



(The product expression in the first line is not surprising 
given the fact that graph in Figure 6(c) contains two 



independent components, each contributing one factor to 

the above product.) 
Similar observations can be made for the other five triple 
covers in Figure 6(b)-(g), and so we obtain 

(permB_3(0)) s$ (perm(0)) , 



I.e., 



pcrmg 3(0) $; perm(0). 



D 



Example 47 Let n = 2 and M € Z>o. We perform similar 
computations as in Example 44, but for a general non- 
negative matrix 9. The observations that we made there can 
be generalized (beyond the all-one matrix), and we obtain 

M 
1=0 



Because 

(pcrmW)*^ =^ (7 )(0l,l02,2)^^-^(el,2e2,l)^ 
we see that 



,M s-^ I M 



£=0 



^PerniB j,^(6')) ^ (pcrm(6')) , 



I.e., 



pemig j,^(0) < perm(0). 
Moreover, in the limit M — > 00, we have 

permB(0) = limsup perm^ m{^) 

= max(6'i,i6'2,2, 6*2, 16*1, 2)- 

This confirms the result for permB(0) in Lemma 40, which 
was obtained by analytical means. D 

For n > 2, we leave it as an open problem to obtain an 
explicit expressions for permB^,j(0), M e Z>o, either for 
the all-one matrix case, or for the general non-negative matrix 
case. 

In conclusion, the above examples shows that in general 
permB(0) 7^ perm(0), however, they also show that the 
Bethe permanent has the potential to give reasonably good 
estimates, in particular in the cases where the "coupling effect" 
in the average graph cover is not too strong. Heuristically, this 
"coupling effect" seems actually to be the worst for n — 2 and 
become weaker the larger n is. 

E. Relevance of Finite Graph Covers 

If the NFG N(9) had no cycles then the SPA could 
be used to exactly compute the partition function. Namely, 
after a finite number of iterations, the SPA would reach a 
fixed point and the partition function Zq[N{9)) = perm(0) 
could be computed with the help of an expression like 



({VS},{^.,^}))- where F^^^^^ is defined 
in Lemma 31. However, N{9) has cycles: the use of this 



expl 



F* 

^ Bcthc 
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expression at a fixed point of the SPA is still possible but 
usually it does not yield the correct partition function. In this 
subsection, we would like to better understand the source of 
this suboptimality. 

To that end, observe that the SPA is an algorithm that 
processes information locally on N(0), i.e., messages are sent 
along edges, function nodes take incoming messages from 
incident edges, do some computations, and send out new 
messages along the incident edges. On the one hand, this 
locality explains the main strengths of the SPA, namely its low 
complexity and its parallelizability, two key factors for making 
the SPA a popular algorithm. On the other hand, this locality 
explains also the main weakness of the SPA. Namely, a locally 
operating like SPA "cannot distinguish" if it is operating on 
N(0) or any of its covers [11], [49], [50]^ 

More precisely, let N be an Af -cover N of N(0). Such an 
7\/-cover "looks locally the same" as N(0) in the sense that 
the local structure of N is exactly the same as the one of 
N(0). (Of course, globally N and N(0) are different because 
the former NFG contains M times as many function nodes 
and M times as many edges.) Consequently, if the SPA is 
run on N with the same initiaUzation as the SPA on N(0) 
(every initial message is replicated M times), we observe that, 
because both graphs look locally the same and because the 
SPA is a locally operating algorithm, after every iteration the 
messages on N are exactly the same as the messages on N(0), 
simply replicated M times. In that sense, the SPA "cannot 
distinguish" if it is operating on N(0) or N, or, in fact, any 
other 7\/-cover of N(0). This observation allows us to give the 
following interpretation of (8) (which is reproduced here for 
the ease of reference) 



permB,M(e)= W(perm(0tp)^ 



Pe*A 



(10) 



Namely, because the SPA implicitly tries to compute in parallel 
the partition function Zg{H{0)'^^) = ^cvu\{e'^^) for all M- 
covers of N(0), yet it has to give back one real number 
only, the "best it can do" is to give back the average of 
these partition functions, i.e., ( perm ( 0^^ ) ) pcS ■ (The Ki- 
th root that appears in (10) is included so that the result is 
properly normalized w.r.t. Zg(N(0)) = pcrm(0).) 

Let us conclude this section with a comment on a paper 
by Barvinok [23] that presents bounds on the number of 
zero/one matrices with prescribed row and column sums. (As 
already mentioned in Section I-C, in statistical physics terms 
the approach taken therein can be considered as a mean- 
field approach.) In terms of NFGs, the quantity of interest 
is expressed as the partition function of an NFG that has the 
same topology as N(0) but different function nodes. 

Section 3.1 of [23] then presents an interpretation of these 
bounds that has a similar flavor of the graph cover inter- 
pretation of the Bethe permanent, however, it also has stark 
differences. Namely, in terms of NFGs, Section 3.1 of [23] 
presents an NFG where every function node of the base graph 
is replicated I\I times and every edge is replicated AP times, 
i.e., all Mn left-hand side function nodes are connected by 
exactly one edge to all the AIn right-hand side function nodes. 
In order for this to make sense, the local functions are adapted 



so that they have Mn arguments instead of n arguments. It 
is then shown that the Af^-th root of the partition function 
of this new NFG, A/ — > oo, yields the relevant number in 
which the bounds are expressed. Despite all the similarities, 
the differences to finite graph covers are clear: 

> There is only one such A/-fold version of the base graph, 
whereas the number of A/-covers of N(0) is (Af!)'" \ 

* The number of edges is APn?, whereas the number of 
edges in an A/-cover of N(0) is Aln"^. 

* The local functions need to be adapted in order to 
allow for Mn instead of n arguments, whereas the local 
functions of an A/-cover of N(0) are the same as the 
local functions of N(0). 

VII. The Relationship between the Permanent 
AND THE Bethe Permanent 

In this section we explore the relationship between pcrm(0) 
and permB(0), in particular, if and how the perm(0) can be 
upper and lower bounded by expressions that are functions 
of pcrmB(0). For an additional/complementary discussion on 
this topic we refer to [17]. 

We start with a lemma that shows that there are non-negative 
square matrices for which the Bethe permanent can give rather 
accurate estimates of the permanent, thereby showing the 
overall potential of the Bethe permanent to be the basis for 
good upper and lower bounds on the permanent of general 
non-negative square matrix. 

Lemma 48 Let l„xn be the all-one matrix of size nxn. Then 

perm(l„xn) P^jm , - 

pcrmB(l„x«) V e ' 



where o(l) is w.r.t. n. 
Proof: See Appendix H. 



D 



Although the factor \p2nrnj~c is non-negligible, compared 
to pcrm(l„xn) = n\ it is rather small. 

A. Lower Bounds on the Permanent of the Matrix 6 

In this subsection we study lower bounds on pcrm(0) based 
on \)CYm^{0). 

Theorem 49 (Gurvits [10]) It holds that 
pcrm(0) 



pcrmB(0) 



^ 1. 



Proof: This result was recently shown by Gurvits [10]. 
Roughly speaking, its elegant proof is based on first expressing 
in terms of a stationary point of Fg M(e) and then applying 
an inequality due to Schrijver [51]. For more details, along 
with a discussion of this result's relationship to the results 
in [52], [53], we refer to [10]. D 

Corollary 50 (Gurvits [10]) For any 7 S r„x,i it holds that 
perm(0) 



exp(-FB,N(e)(7)) 



^1. 



Proof: This is a straightforward consequence of Theorem 49 
and Definitions 11 and 12. D 

This corollary has its significance when one is not willing to 
run the SPA algorithm, but one has a reasonably good estimate 
of the 7 G r„xn that minimizes i^B.N(6()- This approach is for 
example interesting when one wants to obtain analytical lower 
bounds on the permanent of some parametrized class of non- 
negative square matrices. 

In the Allerton 2010 version of this paper we also stated 
the inequality that appears in Theorem 49. However, while 
writing the present paper we realized that our "proof" of that 
theorem had a flaw, which, so far, we have not been able to 
fix. However, we still think that our proof strategy can work 
out and possibly give an alternative viewpoint of Schrijver's 
inequality that features prominently in [10]. In that respect, we 
list below some special cases of matrices 6 for which our proof 
strategy works, along with conjectures that, if true, would give 
an alternative proof of Theorem 49 in its full generaUty. 



Conjecture 51 For any M e 



^>o 



it holds that 



perm 



6^' 



PG*j 



^ (pcrm(0)) 



M 



Possibly also the following, stronger, statement is true: for any 
M G Z>o and any P G ^m it holds that 



perm 



e^ 



^^<(perm(0))*^ 



D 



Theorem 49 would then follow from 



'PCTU1^{6 



(a) 



(b) 



limsup permBjyy^(0) 

M->CX3 



lim sup 

(c) 

^ limsup 

M— >oo 



perm 



V //PG*A 



pcrni(0) 



M 



(d) 



limsup perm(0) 
perm(0), 



where at step (a) we have used Theorem 39, where at step (b) 
we have used Definition 38, where at step (c) we have used 
the weaker part of Conjecture 51, and where step (d) follows 
from evaluating the (now trivial) limit A/ ^- oo. 

We now list some special matrices 6 for which Conjec- 
ture 51 is true. 

• Conjecture 51 is true for = Inxn- (The proof is given 
in Appendix 1.) 

• Conjecture 51 is true for all matrices that were studied 
in Section VI. 

Actually, the results in Section VI suggest the following, 
stronger version of Conjecture 5 1 . 



as polynomials in the indeterminates {Oi_j\ij. We conjecture 
that the coefficient of every monomial of the first polynomial 
is upper bounded by the coefficient of the corresponding 
monomial of the second polynomial. 

Possibly also the following, stronger, statement is true. Fix 
some M g Z>o and P S ^ m, and consider the expressions 



perm 1 0^ j and (perm(0)) 



M 



as polynomials in the indeterminates {^i,j}i,j- We conjecture 
that the coefficient of every monomial of the first polynomial 
is upper bounded by the coefficient of the corresponding 
monomial of the second polynomial. D 

B. Upper Bounds on the Permanent of the Matrix 9 

In this subsection we list conjectures and open problems 
w.r.t. upper bounds on perm(0) based on peiin^{9). 

Conjecture 53 (Gurvits [10]) Let 6 be an arbitrary non- 
negative matrix of size n x n. For even n it is conjectured 
that 



perm(0) 



<%/2", 



(11) 



permB(0) 

with a similar conjecture for odd n. Note that (11) holds with 
equality for the matrix = I{n/2)x{n/2) ® l2x2, which is the 
Kronecker product of an identity matrix of size {n/2) x {n/2) 
and the all-one matrix of size 2x2. D 

The above conjecture replaces the conjecture that we made 
in the Allerton 2010 version of this paper where, for fixed 
n, the largest ratio pcrm(0)/permB(0) was thought to be 
obtained for the all-one matrix of size n x n. 

Besides proving the bound in Conjecture 53, it would be 
desirable to prove statements of the form 



Pr <^ e © 



^T}^l-e, 



perni(0) 
pcrnig(0) 

where is some ensemble of random matrices of size nx n, 
where r is some positive real number, and where e is some 
small positive number For example, for the ensemble of 
n X n matrices where the matrix entries are chosen uni- 
formly and independently between and 1, we conjecture that 
pcrm(0)/pcrmg(0) is, with high probability, upper bounded 
by the ratio that appears in Lemma 48. (Note that this ratio is 
much smaller than the ratio that appears in Conjecture 53.) 

C. Closeness of the Permanent to the Bethe Permanent 

In this subsection we list some cases where perm(0) is 
relatively close to pcrmg(0). We start with an auxiliary result 
that relates the Bethe permanent of a lifted matrix to the Bethe 
permanent of the base matrix. 



Conjecture 52 Fix some M G Z>o and consider the expres- Lemma 54 For any M G Z>o <^nd any P G ^j\/ it holds 
sions that 



(peTmlO' ]) ^ _ and (pcrm(0)) 

\ V / / PefM 



permg [d^'^) = (pcrmj3(0)) 
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Proof: See Appendix J. 



D 



Theorem 55 For any a > 1 and any M ^ Ma, the majority 
of the matrices {^^''IppS, satisfies 



1^ 



perm 



(etP) 



< a 



M 



pernig (^^-'^j 

Here M^ is a parameter that depends on a. 

Proof: The first inequality follows from Theorem 49. We prove 
the second inequality by contradiction. So, assume that there 
is an a > 1 and a constant Ma such that for all M ^ Ma 
the set $j,^ C $jv/ of all lifted matrices 0^^ that satisfy 
perm(0^-'') ^ a^' ■ permg (0'''-^) has size at least |$m|/2. 
Then 



permB.^^(6») 



(a) 



(b) 



> 



(c) 



(d) 




a ■ permB(0) 



(e) 



^2-i/^^-a-permB(0), 

where at step (a) we have used Definition 38, where at step (b) 
we have replaced the angular brackets by the corresponding 
normalized sum, where at step (c) we have used the assump- 
tion, where at step (d) we have used Lemma 54, and where at 
step (e) we have again used the assumption. However, taking 
limsup^y^^g^ on both sides of the above expression, we see 
that we obtain a contradiction w.rt. Theorem 39. D 

The following example partially corroborates Theorem 55. 

Example 56 For some positive integer M, consider the ma- 
trix 



0tP = 



where I is the identity matrix of size M x M and where Pj 2 
is a once cyclically left-shifted identity matrix of size M x M. 
Then 




ie^"") 



gAf nM 
''l,l'^2.2 



gj\f /lA/ 
'l.2'^2,li 



perm 
■permB(0)) = (pcrmB(0)) 

= (max(6'i,ie'2,2, ^i,26'2,i)) 



M 



where the first result is a consequence of the observation that 
the underlying graph has exactly one cycle, i.e., only two 
perfect matchings, and where the second result follows from 
Lemmas 40 and 54. Therefore, 



perm {6'^^ 



-J- <^2. 

Note that the right-hand side of the above expression does not 
only grow sub-exponentially in M, it does not grow at all. D 



Let us conclude this subsection with the following remark. 
As akeady mentioned, the proof of Theorem 49 takes ad- 
vantage of an inequality by Schrijver [51], and therefore 
the closeness of pcrm(0) to pcrmB(0) is linked with the 
tightness of Schrijver's inequality. Now, interestingly enough, 
when Schrijver demonstrates a certain asymptotic tightness of 
his inequality, cf [51, Section 3], he implicitly evaluates and 
compares both sides of his inequality for some finite cover of 
a certain graph. 



D. Open Problems on the Relationship between the Permanent 
and the Bethe Permanent 

There are also classes of structured matrices for which 
it would be interesting to better understand the relationship 
between the permanent and the Bethe permanent. For example, 
the permanent of the matrix 



(^ 



V 



an 



1\ 



1/ 



with ^ m ^ n, real numbers a^ ^ 0, £ G [n], and real 
numbers ^£, i G [to], turns up in a variety of contexts. 

> When J2ee\n\ o^i — ^ ^"d fie are non-negative integers 
then pcrm(0) corresponds to the probability of the pat- 
tern of a sequence (see, e.g., [54]). 

> When 171 = 12 and fie = n—l — i,iE [n], then perm(0) 
appears in the analysis of list ordering algorithms (see, 
e.g., [55]) or in the analysis of source coding algorithms 
(see, e.g., [56]). Note that in this case, is a Vander- 
monde matrix. 

Moreover, given the fact that the above 6 depends only on (at 
most) 2n parameters (and not on n^ parameters as 9 in (1)), 
one wonders if speed-ups in the SPA-based computation of 
peTui^{9) are possible. 

Moreover, in some applications one is not interested in 
the absolute value of the permanent, only the relative value 
in the sense that for two matrices 6 and 9' one wants to 
know which one has the larger permanent. Therefore, for 
some suitable stochastic setting it would be desirable to state 
with what probability pcrm(0) ^ pcrm(0') is equivalent to 
pej:ni^{9) ^ pcrmB(0')- Some very encouraging initial inves- 
tigations of this topic have been presented in [8, Section 4.2]. 
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VIII. Fractional Bethe Permanent 

The terms that appear in Hb{j) in Lemma 14 all have either 
coefficient +1 or —1. The main idea behind the fractional 
Bethe entropy function is to allow these coefficients to take on 
also other values. This is done towards the goal of obtaining 
a modified Bethe free energy function whose minimum re- 
sembles the minimum of the Gibbs free energy function even 
more.'^ Such generalizations of the Bethe entropy function 
were for example considered in [57]-[62] and a combinatorial 
characterization of the fractional Bethe entropy function was 
discussed in [45]. In particular, for the permanent estimation 
problem such generalizations are extensively studied in the 
very recent paper by A. B. Yedidia and Chertkov [17], to 
which we refer for additional discussion on this topic. 

As we will see in this section, if the modifications to the 
Bethe entropy function are applied within some suitable limits, 
the concavity of the modified Bethe entropy function (and 
therefore the convexity of the modified Bethe free energy 
function) will be maintained. 

Definition 57 Let 

be a collection of real values. We define the k- fractional Bethe 
entropy function to be 



and the K-fractional Bethe permanent to be 



i/1"^ : r. 



i J 



*J 



(Clearly, if all values in k equal 1 then H^ (-y) = Hsi'j), 
with H^i'^) as shown in Lemma 14.) D 

Lemma 58 The fractional Bethe entropy function from Defi- 
nition 57 can also be expressed as follows 

Hb\i) = - ^{i^i + Hj-i^z,]) ■ li,j log(7»,j) 

id 

(If all values in k equal 1 then H^ ('/) = Hb (7), with 
HbI'j) as shown in Corollary 15.) 

Proof: Follows from combining Definition 57 and Lemma 14. 

D 
The following definition generalizes Definitions 11 and 12 
and Corollary 15. 

Definition 59 We define the K-fractional Bethe free energy 
function to be 



F^ ■ r„xn ^ R, 



W/ 



7 ^C/B(7)-^r(7), 

'^One might also modify ^^3(7): however, we do not pursue this option 
here. 



permg (0) = cxp I —mining (/3) 



D 



The following theorem gives a sufficient condition on k so 
that the K-fractional Bethe entropy function is concave in 7, 
thereby generalizing Theorem 22. 

Theorem 60 If k is such that 

K, ^0 (ie I), 
'ij ^0 ije J), 



Ki + Hj ^ 2Kij {{i,j) e I X J). 



Ti'^)l 



then i/g (7) is a concave function of 7 and F^ (7) is 
convex function 0/7. 

Proof: We have 



- K,j •7rjl0g(7.j) 



X ^ / ^'i ' ^i ^'i ' ^i 



(b) ^-^ Ki 



+ K,J •(l-7,,j)log(l-7jj) 



^-^ 2 



^•5(7.) 



El 



y^ f i^t + i^j 



«j 



^•^(7,) 



h{l^., 



where at step (a) we have used Lemma 58, and where at 
step (b) we have used the 5-function as specified in Def- 
inition 19 and have introduced the binary entropy function 

h : [0, 1] ^ %J^ -^log(0 - (1-0 log(l-0- If ^^^ > 0, 
Kj ^ 0, and '''^"^ — Ki .j ^ (the latter being equivalent 

to Ki + Kj ^ 2Ki.j), then the concavity of H^ (•y) in 7 
follows from Theorem 20, the well-known concavity of the 
binary entropy function, and the fact that the sum of concave 
functions is a concave function. 

The convexity of -Fg (7) in 7 follows from the concavity 
of i?B (7) in 7 and the linearity of Ub (7) in 7. D 

Lemma 61 An interesting choice for k is 



Kj =1 



1 



1 

2n 



(j e J), 
{{i,j)elxj). 



rM^ 



The resulting H^ (7) is a concave function of 7 and the 
resulting F^ {'j) is a convex function ofj. Moreover, letting 
Inxn be the all-one matrix of size n x n, we obtain 



pcrm(lnx«) 



(1 + 0(1)) =0.922. ..-(1 + 0(1)). 
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a 0.93- 



0.92L 



Moreover, the convergence analysis in Section V has to 

be revisited. 

We leave it as an open problem to explore the k parameter 

space and to find fractional Bethe permanents for which 

interesting statements can be made, in particular for 

which a statement like the one in Theorem 49 can be 

made. 



IX. Conjectures 

It is an interesting challenge to look at theorems involving 
permanents and to prove that the theorems still hold if the per- 
manents in these theorems are replaced by Bethe permanents. 
Let us mention two conjectures along these lines. 



10 



20 



30 



40 



50 



Fig. 8. Illustration of the ratio perm(l„xn)/ pernig (Inxn) for the 
special choice of k. in Lemma 61, when n varies from 2 to 50. 



A. Perm-Pseudo-Codewords 

The following conjecture is based on a theorem in [63] 
involving permanents of submatrices of a parity-check matrix. 



(Note that, in contrast to Lemma 48, there is no ^Jn-factor on 
the right-hand side of the above expression.) 

Proof: See Appendix K. D 

Let us make a few comments about the choice of k in 
Lemma 6L 

• Figure 8 shows the exact ratios for n from 2 to 50. In 
particular, note that for n = 2 we have 

pcrm(l2x2) 



pcrm(r'(l2x2) 



= 1. 



For even integers n and for the choice of k from 



Lemma 61, the matrix 



the ratio 



pcrm(0) 



f(r,/2)x(n/2) 



yields 



pcrn4"-'(e) 



= 1. This is in stark contrast to 



Conjecture 53 where 6 represents the conjectured "worst- 



case" matrix for the ratio 



pcrm(0) 



pcrmj3(e)- 

For integers n and k such that k divides n we have 
(0.922... )"A-^ P^™^^ ^1 



pcrmg 



M(0) 



for the matrix 9^1, 



{n/k}x{n/k) 



Lfcxfc- 



Let us conclude this section on the fractional Bethe entropy 
function with a few comments. 

• The SPA message update equations in Section V need 
to be modified so that its fixed points correspond to 
stationary points of the fractional Bethe free energy, i.e., 
so that a modified version of the theorem by Yedidia, 
Freeman, and Weiss [9] holds. In contrast to the SPA 
message update equations in Section V, the modified 
SPA message update equations will be such that the 
right-going messages depend not only on the previous 
left-going messages but also on the previous right-going 
messages, and such that the left-going messages depend 
not only on the previous right-going messages but also on 
the previous left-going messages. (We omit the details.) 



Definition 62 Let C be a binary linear code described by 
a parity-check matrix H G Fj"^^", m < n. For a size- 
(?7i+l) subset S of the column index set 1{H) we define 
the Bethe perm-vector based on S to be the vector uj S Z" 
with components 



UJ, 



~ 10 



ifi£S 
otherwise 



where Hg\i is the submatrix of H consisting of all the 
columns of H whose index is in the set S \ {i}. D 



Conjecture 63 Let C be a binary linear code described by 
the parity-check matrix H € F™^", m < n, let K,{H) be 
the fundamental cone associated with H [49], [50], and let 
S be a size-{m+l) subset of T{H). The Bethe perm-vector 
u) based on S is a pseudo-codeword of H, i.e.. 



u;e/C(i?), 



(12) 
D 



B. Permanent-Based Kernels 

Based on a result by Cuturi [64], Huang and Jebara [8] 
made the following conjecture. 



Conjecture 64 (Huang and Jebara [8]) Let n be a positive 
integer and let X be a set endowed with a kernel k. Let X ~ 

{xi, . . . ,Xn} G ^" and Y = {j/i, . . . ,y„} G A"". Then 



Kpcrmj, : {X,Y) ^ permg ([^(a^j, 2/j)] i^^^„_ ^^^-^^^ 
is a positive definite kernel on X"' x A"". D 
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X. Conclusions 

In this paper, we have pursued a graphical-model-based 
approach to approximating the permanent of a non-negative 
square matrix, the resulting approximation being called the 
Bethe permanent. We have seen that the associated functions, 
Uke Bethe entropy function and Bethe free energy function, 
are remarkably well behaved for a graphical model with a 
non-trivial cycle structure. In that respect, an important part is 
played by a theorem by Birkhoff and von Neumann (c/ The- 
orem 3). Moreover, the SPA can be used to efficiently find the 
minimum of the Bethe free energy function and thereby the 
Bethe permanent. We have also presented a graph-cover-based 
analysis that gives additional insights into the inner workings 
of the Bethe permanent, its strengths, and its weaknesses, 
and we have commented on Bethe-permanent-based upper 
and lower bounds on the permanent. Along the way we have 
stated several conjectures and open problems, that, if answered 
one way or the other, could further elucidate the relationship 
between the permanent and the Bethe permanent. 
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Appendix A 
Proof of Theorem 20 

Observe that once the concavity of S is established, it is 
straightforward to verify the claim in the theorem statement 
that S{^) ^ for all ^ G !![„]. Indeed, because n[„] is a 
polytope with n vertices, because S takes on the value at 
each of these vertices, and because S is concave, this statement 
is true. 

Therefore, let us focus on the concavity statement. Clearly, 
for n = 2 the statement can easily be verified and so the rest 
of this appendix will only discuss the case n ^ 3. 

By definition, a multi-dimensional function is concave if 
it is a concave function along any straight line in its domain. 
Towards showing that this is indeed the case for S, let us fix an 
arbitrary point ^ G n[„] and an arbitrary direction ^ £ ]R"\{0} 
such that the function £(t) = ^ + t • ^ satisfies ^{t) G n[„] 
for a suitable t-interval around (to be defined later). We 
need to distinguish three different cases that will be discussed 
separately in the following subsections: 

1) The point ^ is in the interior of ![[„]. 

2) The point ^ is at a vertex of Il[„y 

3) The point £ is neither in the interior nor at a vertex 

of n[„]. 

A. The Point ^ is in the Interior o/n[„] 

It is straightforward to see that the direction vector ^ must 
satisfy 

1^6=0, (13) 



otherwise ^(t) G ![[„] holds only for t = 0. Therefore, 
we assume that (13) is satisfied. Moreover, because ^ G 
interior(n[„]), we have < ^^ < 1, £ G [n], and we can 
find an e > such that ^(i) G n[„] for — e ^ t ^ e. We will 
now show that the function 1 1-^. S'(^(t)) is concave at i = 0. 
We start by computing the first-order derivative 



dt 



d6(t) 



and the second-order derivative 



(a) 






^ b(f] ^ ^ 1 



e^w 



where at step (a) we have used Lemma 18. In particular, at 
i = we have 

d2 



dt2 



s{m) 



t=0 



where 5^, £ G [n], is defined as 



e 



F- + F- 






E^^ 



E^^' 



1-2^^ 



(14) 



The proof will be finished once we have shown that 
^S'(^(i)) ^ at t = 0, which is equivalent to the condition 
that 



^ (Sf s^ 0. 



(15) 



We show this by separately considering two cases, the first 
case being ^ G interior(n[„]) n [0,1/2]", the second case 
being i G interior (n[„]) \ [0, 1/2]". 

The first case, ^ G interior (11 [„]) n [0, 1/2]", is relatively 
straightforward. Namely, for all I G [n] we have < ^^ ^ 1/2, 
which implies 1 — 2^^ ^ 0, which in turn implies 5i ^ 0, and 
so (15) is satisfied. 

The second case, ^ G intcrior(n[„]) \ [0,1/2]", needs 
somewhat more work. We start by observing that there is a 
unique i* G [n] such that ^^. > 1/2. (Note that there can 
only be one such l* G [n] because X]^ Cf = !■) Subsequently, 
1 - 2^^. < and 1 - 2^^ > 0, £ 7^ t . 

In the following, it is sufficient to consider only directions ^ 
that satisfy li- > and ^£ < 0, £ 7^ t, or that satisfy ig, < 
and ^f ^ 0, £ ^ £*. This follows from contemplating (13) 
and (14) and from observing that for a given ^ and given 
directional magnitudes { JCf | } « /^., the left-hand side of (15) is 
maximized by a ^ that satisfies the conditions that we have just 
mentioned.'^ From (13) it follows that such direction vectors 
^ satisfy 



\^i 



Ei^^ 



(16) 



'''in other words, such a ^ produces the "worst-case" left-hand side in (15): 
if we can show non-positivity for such direction vectors, we have implicitly 
shown non-positivity for any other direction vector. 
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Before continuing, let us introduce 

v2 



(e.^ 






i-C£ 






?/ 



Note that J^e ^e = ^' + ^"^ ^nd so, if we can show that 6' ^ 
and 6" ^ then we have verified the desired result (15). 
The fact (5' ^ is a consequence of the equation 



CI 



(b) 1 






(c) 



6* 



where step (a) follows from ^ being in !![„], which implies that 
^£. = ^—'Ylii'M-' C^'' which in turn implies that ^f. ^ 1— Cf for 
all i ^ f . Moreover, step (b) follows from a simple inequality 
and step (c) follows from (16). 

The fact 6" ^ is shown as follows. We start by observing 
that 



{i-U') 




where step (a) follows from ^ being in n[„] (which implies 
that ^£* = 1 — X]£j£. C^), where at step (b) we use the 
Cauchy-Schwarz inequality, and where at step (c) we use (16). 
Rearranging this inequality, we see that it is equivalent to the 
inequality 5'1 ^ 0. 



B. The Point ^ is at a Vertex ofT\[n] 

Clearly, the direction vector ^ must satisfy (13). Moreover, 
because £ is at a vertex of !![„], there is an (* £ [n] such that 
^i-, = 1 and Cf = 0, £ 7^ £*, and such that l^. < and |f ^ 0, 
£ y^ £*. Then we can find an e > such that |(i) G n[„] for 
^ i ^ e. We will now show that the function t t-^ S[£,{t)) 
is concave at t = 0. 

We start by plugging in the definition of ^{t) into S'(^(t)), 
i.e., 

^(^w) = -Ec^wiogfew) 

e 
= - (1 + 16- ) log(l + i6- ) - E (*^^) log(*^^) 

+ i-ti,,)iog{-tii.) + E (1 - ti,) iog(i - tii). 



From this we compute the first-order derivative 

- J2 6 iog(o - Y. ^^ ^°siii) - E ^^ 

iyii' £^i* i^l' 

-^e^iog(i-i6)-^6 



(-i(. 



e^c 



(a) 



-ii'\og{l + tie,)-Y,ii^ogiii) 



e^f 



- ii' iog(-6* ) - E ^"^ iog(i - *^"^)' (17) 

where at step (a) we have used J^t S,e = multiple times. The 
second-order derivative is then 



pirn) 



For t IQ we obtain 

d2 






:-Lt 






1 + 16* e^, 1 - t^i 



di2 



Sim) 



tiO 



-il + E ^" 



(a) 



(b) 



E^O +E4^ 



where at step (a) we have used (13) and where step (b) follows 
from a simple inequality and the fact that (,e ^ for £ ^ i*. 
Therefore, the function t M' 5(^(i)) is concave at t = 0. 

C. The Point ^ is Neither in the Interior nor at a Vertex o/nr„i 

The fact that ^ is neither in the interior nor at a vertex of 
n[„] means that there is an £* S [n] such that < ^£* < 1. 
Clearly, the direction vector ^ must satisfy (13), plus some 
additional constraints that are irrelevant for the discussion 
here. Then we can find an e > such that ^(i) S n[„] for 
^ t ^ e. The concavity of the function t i-^. 5(^(t)) at 
t — Q follows then from the observation that, for small non- 
negative i, the second-order derivative of 5'(^(t)) w.rt. t is 
dominated by the second-order derivative of the expression 

-Ef:j,=o,j",>oC«(*)log(C^(^))' ^ function that is concave 
in t. 

Appendix B 
Proof of Lemma 24 

We obtain the expression in the lemma statement by evaluat- 
ing 5(^(i)) and the first-order derivative of S'(^(t)) w.r.t. t at 
t — Q. Clearly, ^(^(t)) = and so we can focus on computing 
the first-order derivative. 

Fortunately, in Appendix A-B we have already computed 
the first-order derivative for exactly the same setup. Namely, 
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from (17) we obtain 

is{m) = -ii' iog(i + tcr) - E ^^ iog(^^) 

In the limit 1 1 this simplifies to 

d 
di 



^(^(0) 



Uo 



Y. ^"^iog(6) + (-6-)iog(-eV). (18) 



This can be rewritten as follows 



dt 



IC. 



£5:^^. 16*1 VI6* 



where we have used — ^£. = IC^*|; 6 = ICf|; ^ 7^ ^*i and 
l6-| == Ef^^^f- I6I' '•«-, Ef,:^^* 161/16*1 = 1- This verifies 
the expressions for S[^{t)) = in the lemma statement. 

Finally, the non-negativity of the coefficient of t in (4) 
follows from j^^. | ^ \^i\, i 7^ £*, which is a consequence 
of the above-mentioned relation |^f. | = J2i=££' 161- 

Appendix C 
Proof of Lemma 25 

Clearly we have jij = 1 if j = a{i) and 7^^ = otherwise. 
From the condition that 7 is such that 7(t) G r„xn for small 
non-negative t, it follows that ^ jij — for all i e I and 
Ei 7i ,j = for all j G J7. Moreover, for every i G I we have 
7i.j ^ if j = a(i) and 7i_j ^ otherwise. Then 

" 2 ^ ^ ^"'^ log(7'j) + 2 E(~^*0) j) ^°g("'^*0) j) 

+ 0(^2) 

where step (a) follows from Lemma 21 and where at step (b) 
we have used 5(7,) = 0, S'(7j) = 0, and (18). 

We observe that in the above expression there are exactly 
two terms for every edge e = [i,]) Gl x J . Rewriting these 
summations such that all the main summations are over i e I, 
we obtain 

= -<E E l^■3^'^dl^■3)+'t'Y^i^l^Mi))^°?.i~%<y{i)) 

+ Oif) 

i 

which is the first display equation in the lemma state- 
ment. Here, at step (a) we have used -~ji.cr{i) ~ \ji.cr{i)\, 



%3 = \%jl 3 7^ cr(i), and |7,,^(,)| == Ej#^(,;) \%j\' '-e-' 

J2j^a{i) I7^jl/l7»,^(»)l = 1- 

The non-negativity of the coefficient of t in the above 

expression follows from |7i,(j(j)| ^ l7i.jl' J 7^ "■(*)' which 
is a consequence of the above-mentioned relation |7i ,j(i)| = 

On the other hand, rewriting these summations such that all 
the main summations are over j S J', we obtain the second 
display equation in the lemma statement. 

Appendix D 
Proof of Theorem 26 

From the assumptions in the theorem statement it follows 
that \%a{i)\ = -7j,<T(i) for aWiel and that \jij\ = 'y.^j for 
all i £ I, j £ J'\ {(j{i)} (see also the proof of Lemma 25 in 
Appendix C). Then, 

UB{l{t)) 

*=' - E(i+n.,.(.)) iog(0,,.(.)) - E E (*T'^j-) i°g(^^^-) 
- -E^°g(^*''^('))^^E E i7^jUog 

(c) 






^i,a(i) 



= c-iE E i7..iiog 

where at step (a) we have used Corollary 15, where at step (b) 
we have used that J^ili-j ^ holds for every i E I, 
i.e., that -7,,^(,) = Ej#^(,) 7»j = J2j^a{,)\%3\ holds 
for every i G I, and where at step (c) we have defined 
C = -Eilog(6'i_cr(i))- (Note that there is no O(t^) term in 
the above expressions.) Then 



M^W) 



(a) 



(b) 



Ub{i)-Hb{7) 



=c-tY: E i7..iiog 



(c) 




= c'-^EEi'^^'^(*')i^°g 

i i' ^i 






E\l^,<T{i')\ , 
rlo. 



i'^i 



l7i,<TWl 




(d) 



-^E'^^.'^wi 

i 

c - 1 E E ^^ ■ P'^^' ■ [ ~ iog(p^«' ) + Tz,e] + o{e 

i i' ^i _ 




(19) 

where at step (a) we have used Corollary 15, where at step (b) 
we have inserted the above expression for C/b(7) and the 
expression for H'Q{'y) from Lemma 25, where at step (c) we 
have replaced the summations over j e J, j 7^ cr(j), by 
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Fig. 9. Trellis for the random walk described in Appendix D. (Here n = 5.) 
Highlighted is an instance of a possible walk. 



summations over i' £ I, (T{i') ^ cr{i), i-e., by summations 
over i' e I, i' 7^ i, and where at step (d) we have introduced 
the definitions 



P. 



fJ-i = ni,aii)\ 

A \'l^.cr{^')\ 
i.i' — I * I J 

]i,i' - /^i ■ Pi,i' = \l'i,a{i')\, 



T^,, ^ log 






(20) 
(21) 
(22) 
(23) 



for all {i,i') € I x I with i 7^ i'. One can verify that the 
assumptions on 7 imply that 

i 

^Pi,i' = 1 (for all i e J), 

y^ Qi,t' = Hi (for all i e T), 

i'^i 

y^ Qj,i' = /ij' (for all i' e I), 

i^i' 

In order to obtain the theorem statement, we need to 
maximize the coefficient of (— t) in (19). Before doing this, 
let us quickly discuss the meaning of this coefficient. 

Namely, consider the trellis in Figure 9 with state space I 
(i.e., with n states) and where a trellis section has a branch 
from state i G Z to state i' G I if and only if i ^ i' . It is 
straightforward to see that there is a bijection between, on the 
one hand, the set of all left-to-right walks in the time-invariant 
trellis shown in Figure 9, and, on the other hand, the set of 
backtrackless walks in N(0) (cf. Figure 1) that were mentioned 
after Lemma 25. In particular, going from state i e T to state 
i' G I\ {i} in this trellis corresponds to the two half-steps of 
going from node i E I to node o'(i') € ^7 and then to node 
i' E X in N{9). With this, translating (backtrackless) random 
walks to left-to-right random walks in the trellis in Figure 9, 
we obtain that 

• fii is the probability of being in state i, 



Pi J' is the probability of going to state i' ^ i, conditioned 
on being in state i, 

Qi^i' is the probability of being in state i and then going 
to state i' 7^ i, 

~J2iJ2i'^it^tPi,i'^(^s{Pi,i') is the entropy rate of (the 
Markov chain corresponding to) the random walk on this 
trellis, 

Ti i/ is a branch metric, 

J2i 'I2i'^i f^iPi.i'Ti^i' is the average branch metric of the 
random walk on this trellis, 

and maximizing the coefficient of (— t) in the above ex- 
pression for FB{'y{t)) means to find the (time-invariant) 
left-to-right random walk on this trellis that maximizes 



EE 



p-i ■ Pi. 



logiPi^t') +Ti^t' 



i' ^i 



i.e., the sum of the entropy rate and the average branch 
metric of the random walk. (In statistical physics terms, 
this expression can be considered to be some negative 
free energy function.) 

The purpose of rewriting the above expression in the way 
we did, was so that it is very close to the notation used 
in [65, Lemma 44] that solved exactly the above maximization 
problem. (Note that related problems were also solved in [66] 
and [67].) 

As was shown in [65, Lemma 44], the maximal value of 

E E '^'' ' P'''' ' t ^ log(P»,''') + '^hi'] 

is log(p) and is attained by 



^^i 



(if t ^ t') 

(otherwise) 



q: 



fj-i -Pt, 



(if I ^ i') 

(otherwise) 



where A, p, u^, and it^ are defined in the theorem statement, 
and where k is a normalization constant such that ^^ /^t* = 
1. Note that A, called the noisy adjacency matrix in [65, 
Lemma 44], is such that Ai^ii = cxpiTi.i') for i ^ i' and 
such that Ai,i = 0. 

Because A contains only non-negative entries, p is the so- 
called Perron eigenvector of A, and u}^ and vP' are the so- 
called left and right, respectively. Perron eigenvectors of A; 
one can show that these two vectors contain only non-negative 
entries. 

Translating this result back using (20), (21), and (22), we 
obtain the result given in the theorem statement. 

Appendix E 
Proof of Lemma 29 

We start by formulating the SPA message update rule for 
functions node gi, i G I, at iteration t ^ 1. Following [28]- 
[30], we have for every i G I, every j E J , and every a^.j £ 
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-^-i , 7 7 



7^i,*'(a»j) - ^T- • XI ■^*("*) ■ n ^^'^■' ^^(«'J'), 



c, 



ii'i^3 



where C^j is some suitable normalization constant. Conse- 
quently, the update of the likelihood ratio reads 



-r^(*)A^i(0)_^-P^^"'^'^^"^^^ 



trt*-!) 



(flj J-" ) 



A,; 



»,J 






(b) 



t7(*-l) 



-1 



9- ■ ^—' 



^*j 






where at step (a) we have used Ai = {uj \ j e J'} for 
simplifying the numerator, and where at step (b) we have 

used the definition of A^ j/ , j' ^ j. This yields the first 
expression in the lemma statement. The second expression is 
obtained analogously by considering the SPA message update 
rule for function nodes gj, j G J', at iteration t ^ 1. 

Now we turn our attention to computing the beliefs at the 
function nodes gi, i e I, at iteration t ^ 0. Following [28]- 
[30], we have for every i e X and every a^ G Ai, 






a 



where Ci is chosen such that J2a Pi 

ai = u-j, j G J, we get 



1. In particular, for 



(t) 



a 
1 

1 



/.(«o- iikko) -n 






^- liW'mMh 



Because Ci and the expression in the parentheses are inde- 
pendent of j, we have just verified the third expression in 
the lemma statement. The fourth expression in the lemma 
statement is obtained analogously by considering the beliefs 
at function nodes gj, j G J^, at iteration t ^ 1. 

Appendix F 
Proof of Lemma 3 1 

The pseudo-dual function of the Bethe free energy function 
is given by evaluating the Lagrangian of the Bethe free energy 
function at a stationary point [37]. Therefore, in a first step, we 
want to write down the Lagrangian of the Bethe free energy 
function. To that end, we take the Bethe free energy function 



as in Definition 10, i.e.. 



fb(^) = J2 ubam + E UB,jm 

i 3 

- Y, ^B,,(A) - E ^B,,(/3j) + E ^B,e(/3e). 



(For the purposes of this appendix, the expression for Fb in 
Definition 10 is somewhat more convenient than the one in 
Lemma 14.) 

Now, introducing a Lagrange multiplier for the edge con- 
sistency constraints (but not for the other constraints imposed 
by the local marginal polytope B, cf. Definition 9), we obtain 
the relevant Lagrangian 



iBcthc({/3J, {/3,}, {/3e}, {X}, {X}) 
= FB({A},{/3,},{/3e}) 




Because Fb is convex in {f3i}i and {l3j}j, but concave in 
{/3e}e, the pseudo-dual function of Fb is given by 



.t- , .-r> 



FLU{^e},{K}AV^},{V3}AVe}) 



max mm 

{/3J {/3.}, {13,} 



iBcthc({A:},{/3,},{;9e}, 



{\e},{\e},{'n^},{V3},{Ve}), 



where the maximization/minimization is over all {/3e}e, {A}j> 
{f3j}j that satisfy the constraints imposed by the local 
marginal polytope B, except for the edge consistency con- 
straints. We obtain the maximizing {f3e}e and the minimizing 
{l3i}i, {l3j}j by setting suitable partial derivatives to zero. 
This yields. 



e: i{e)—i 

^ e:j(e)=j 

Pe,a, = y ■ exp I Ae,aJ ' CXp i XeM.j , 



where i{e) and j{e) give the label of the, respectively, left 
and right vertex to which e is incident, and where {Zi]i, 
{Zj}j, and {Ze}e are suitable normalization constants such 
that relevant sums are equal to one. 

Now, plugging these beliefs into the Lagrangian, we obtain 
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(after cancelling several terms) the expression 



^l;thc({Ae},{Aj) 

i y Oi e: i{e)=i j 

3 \aj e:j{e)=j 

+ ^ log I E '^'^P ( ^^'"= + -^e.ae j I ■ 
e \ Oe / 

We proceed by using some details of the definition of N(0). 
Namely, using the definition of the local function nodes and 
taking advantage of the binary alphabet Ae = {0,1}, e G £, 
we obtain (after some simplifications) 



at a vertex of r„x„, whereas the second subsection considers 
the case where the global minimum of F^ is achieved in the 
interior of r„xn- 

For ease of reference, we reproduce here the SPA message 
update rules from Lemma 29, i.e.. 






E, 



jVj 









_ ,(,_!) , i ^ 1, (j, j) e X X J, (24) 



2 J' 






t)^ 



i^ 1, ii,j)elxj. (25) 



,-(- 



^Btthe({X},{X}) 



Ji^j ■ exp ( A(ij)^i 






In both parts of this appendix, the main task will be to exhibit 
a contraction operation of a suitably chosen subset of the SPA 
messages. 

A. Global Minimum of F^ is Achieved at a Vertex ofT^xn 

Let 7 e Cg be the vertex of r„ x n that uniquely minimizes 
Fb. This means that 7 corresponds to the permutation a-y. (In 
the following statement we will use the short-hands a = a-^ 

and a = a^^.) 

->(t) ^-Ct) 

From (24) it follows that Aj cr(i) = ^/^i,a(i)^ i G X, can 

be written as'^ 

-f(t) 

Aj,CT(j) 



1 



E 



V,' 



-(t-i) 



t ^ 1, i e X. 



^ log (1 + exp ((Xa - X,o) + (X 



Ae,0, 



On the other hand, for i e I and j 7^ CT(i) the SPA message 
update equation in (25) implies 



From the results in [9] it follows that at a fixed point of 
the SPA, the quantity A(j jj q ^ ^(i.j),i represents the log- 
likelihood ratio of the left-going message along the edge (i, j), 
and the quantity \{i,j),o— ^(i.j),i represents the log-likelihood 
ratio of the right-going message along the edge (i, j). Clearly, 
for every edge (j, j) e I x J', these quantities are related to 
the inverse likelihood ratios by 



v. 






'I.] 






t 



(t-1) 












Vjj — exp (^ A(ij)^i - A(ij)^oj 7 
Vj,j = exp(^A(ij)^i 
respectively. Therefore, we get 

= -Eiog(V^-V,,,)-Eiog 



€ 



^«j 



V^^Wj • vi*(^.)'] 



-> 



(t-1) 



•A^(^.;;., i^l, ^eI, j^a(i), 



■•V. 



where the inequality follows from the fact that all terms in the 
summation X]i'=ti a(i) ^^ non-negative. Then, combining the 
two above expressions, we obtain 



»j / 1 



E 



which is the expression in the lemma statement. 

Although the interpretation of the log-likelihood ratios was 
given by looking at fixed points of the SPA, it is not difficult 
to see that we can evaluate this last expression for any set of 
inverse likelihood ratios. 

Appendix G 
Proof of Theorem 32 

This appendix has two subsections. The first subsection con- 
siders the case where the global minimum of Fb is achieved 



Aj,CT(j) ^ 

Rearranging terms, we obtain 

Ai,CT(j) 



^(*-i) 



V^^V^^ "'■'^^' 



i ^ 1, i el. 






t ^ 1, i e I. 



'^For simplicity, because j does not appear on the left-hand side of this 
equation, we use jf as a summation variable on the right-hand side. This 
is in contrast to (24) where j appears on the left-hand side and where the 
summation variable on the right-hand side is j'. 
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Now, for every t ^ 0, consider the length-n vector m^^ whose 

i-th entry is A^ o.(i)/-\/^i.cr(t)- Grouping several of the above 
inequaUties together, we obtain the vector inequality 



rA^*H A ■ r^^'-^\ t>l, 



(26) 



where the vector inequality has to be understood component- 
wise, and where the nxn matrix A was defined in Theorem 26 
for the vertex 7 of T„xn- Let p be the maximal (real) 
eigenvalue of A. Then, Corollary 27 and the assumption that 
7 is the unique minimizer of Fb allow us to conclude that 
p < 1. However, because p < I implies that all eigenvalues of 
A have magnitude strictly smaller than 1, the update equation 
in (26) represents a contraction, and so 



|^(*)| 



^ 0. 



Therefore, 



A, 



(t) 

■1,(7 (i) 



i> 0, iel. 



A similar argument shows that 



^sU),j 



^ 0, jej. 



Finally, from (24) and (25) and the above results it follows 
that 

^m Izl^ 0, tei, jej, 3+o{i). 

All these quantities converge to zero exponentially fast. 

When Fb achieves its minimum in the interior of r„xri, 
then we have equality between Fb and F^^j^^ at stationary 
points of the SPA. However, we also have equality in the 
present case. Namely, evaluating F^^^^^^^ (cf. Lemma 31) for 
the above messages, we obtain 



^t.c({V^*^},{^'}) 



-^l0g(6',;^<,(,)), 



for every (i,j) £ E. Note that these SPA fixed point inverse 
likelihood ratios satisfy < \ i.j < 00 and < \i.j < 00, 
otherwise the assumption that we are dealing with an interior 
point of r„xn would be violated. 

It follows from the message gauge invariance mentioned in 
Remark 30 that, for any positive real number C, the inverse 
likelihoods |C • Vi.j} , {^ • Vij} also constitute a fixed 
point of the SPA update rules. We will use this fact later on. 

On the other hand, let |v/*H. .„ { V„ | ^ be a set 
of inverse likelihoods obtained by running the SPA on N(0) 
according to the SPA update rules in Lemma 29. In the 

following, we will not work with {vj^}. . , {Vij}. . 

directly, but with | e IJ } . . , { &i ,/ } ■ ■ f which are implicitly 
defined by the equations 






(1 + ^^*0 



«j 



^(t) 
^i,i 



(29) 
(30) 



(Note that -1 < Vi,*,^ < 00 and -1 < 1^;^*^ < 00.) 
Clearly, {tij^ - - , {^ij }■ ■* can be considered to be a 
"measure" of the distance of the SPA messages to the fixed- 
point messages. In particular, we have established convergence 
of the SPA if we can show that these values converge to zero 
for t — > 00. 

In a first step, we express the SPA message update rules in 

terms of {tl*'}^_^^ and {1>W}^^^. 



which indeed equals Fb(7). From p < 1 and Fb(7) 
— log(pcrmB(0)) it also follows that 



F 



# 



exp I- -Bctho 
for suitable constants C.v e 



-pcrmB(0) 



i^C-e- 



^>o- 



B. Global Minimum of Fb is Achieved in the Interior of T nxn 
In Corollary 23 we established that the Bethe free energy 
function of N(0) is convex, i.e., it does not have stationary 
points besides the global minimum. Therefore, using a theorem 
by Yedidia, Freeman, Weiss [9], we know that fixed points of 
the SPA correspond to the global minimum of the Bethe free 
energy function. 

Let { Vij}. , { Vij}. be inverse likelihood ratios that 
constitute a fixed point of the SPA update rules in Lemma 29. 
As such, these inverse likelihoods must satisfy 



(27) 
(28) 




Lemma 65 For the right-going messages it holds that 




For the left-going messages it holds that 



i-,3 



V(*) 



Y- /77— it '*^ ^ (*) 



1+ s 



i,3 



(31) 
(32) 



(33) 
(34) 



i'^i V "1' -3 " 1,3 



Proof: Let us establish (32). The expression in (34) then 
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follows analogously. We compute 



(b) 



4-(t-l) 



(c) 



^^J 



E 



j'#i 



V, 



■ij 






") 



(d) 



'ij 




7-V.,,)-(i + t^^ 



where at step (a) we have used (29), where at step (b) we 
have used (24), where at step (c) we have used (30), where 
at step (d) we have used (31), and where at step (e) we have 
used (27). Dividing both sides by Vij, and then subtracting 
1 from both sides, yields the expression in (32). D 

Note that 5> is a weighted arithmetic average of the error 



^,3 



., , ., and that (5/. is a weighted arithmetic 



values {Vij' } 

average of the error values { £i'J} ■,,■■ 

Note also that the expressions in (32) and (34) have the 
following peculiarity. Namely, solving e = —S/{1 + S) for 
S we obtain S = —s/{l + e), which is structurally the same 
expression as the first expression but with the roles of e and 
i5 interchanged. 

Lemma 66 Fix an iteration number t ^ 1. Taking advantage 
of the message gauge invariance that was mentioned in Re- 
mark 30, we can rescale the left-going and right-going fixed- 
point messages such that all { e^j }ij- are non-negative. 
With this we define the numbers e,nax 1^ and £,nax ^ 
to be the smallest numbers that satisfy 

^.(*-i) < V(*-i) a j) e £ 



V(*.) <, V(* 



- J J 



^ ^max: V*i.?j ^ ^■ 



Then 



n < v(*) < V(*) < V(*-i) a i) e £ 

" ^ t J J ^ ^max ** ^max \''tJ) ^ ^■ 



Proof: It follows immediately from (31) that 



0<t'*'^V^i;i), (^,J)G£, 



and so, because of (32), we have 



-i< ^(t_i) =s: ^,y s: 0, (^,J)e£:. (35) 

-t \ £niax 



Using (33), this implies 
_1 < _ 



-f- 



(*) 



1 + V'* 



-<S^^>^0, {i,j)e£, 



and so, because of (34), we have 

< ^■^*-^ < ^(*-i) a i) e £ 



(36) 



This proves the statement in the lemma. D 

This shows that the errors stay bounded but it does not 
prove convergence yet. (This result is essentially equivalent 
to the result that is obtained by taking the zero-temperature 
limit of the contraction coefficient that is computed in the 
SPA convergence analysis of [18]: the result is a contraction 
coefficient of 1, which is non-trivial, but not good enough to 
show that the message update map is a contraction.'^) 

It turns out that in order to improve these bounds we have 
to track the error values over two iteration, i.e., four half 
iterations. (We suspect that this is related to the fact that the 
girth of N(0), i.e., the length of the shortest cycle of (6), is 4.) 

Lemma 67 Fix an iteration number t ^ 1. Taking advantage 
of the message gauge invariance that was mentioned in Re- 
mark 30, we can rescale the left-going and right-going fixed- 
point messages such that all { £i)j }ij are non-negative and 
such that, additionally, mini j £i,j = 0. With this, we define 
the numbers Emax ^ and Emax ^ to be the smallest 
numbers that satisfy 



-«J 



tr(*.+ l) < tr(t+l) 



' «J 



(«,.7)ef, 

{i,j)££. 



Then 



< ^(t+i) < V(*+i) < J,' . V(*-i) a j) e £ 

for some constant ^ i^' < 1 that depends only on 6 and 
the fixed-point messages {^i.j}i,j and {^i,j}i,j, i-e., v' is 
independent of t. 



Proof: The statement e 



t^(*+i) 



'J 



^ 0, (ijj) G £ follows from 



applying Lemma 66 twice. Therefore, we can focus on the 

proof of Viiit^^ < v' ■ ^nax 



For a given edge (i, j) £ £, we observe that — Emax 



'£max ) ^ s-ij in (35) holds with equality only if e^j 



tr(*-l) 



-(t-1) 



for all edges {i,j') with j' ^ j. Similarly, for a given 



V(*) 



edge (i, j) G £ we observe that elj < "^max' in (36) holds 



with equality only if e^.j 



tr(*-l) 



=■ ^ ( ir 

tr(*-l)^ 



■£max V(l + '^"ax'') for all 

edges (i', j) with i' ^ i. This motivates the definition of the 
following sets where we track the edges for which a strict 
inequality holds w.r.t. the inequahties just mentioned. Namely, 
for t ^ 1 we define 



^(*^^ ((*,.?) Gf 



tr/ 



£(*)^ (*,j)Gf 



there is at least one edge {i,j'), 

j' ^ j, such that {i,j') G "^('"'^ 

there is at least one edge {i',j), 

i' ^ i, such that {i',j) G £ ^*'> 



tr/ 



With this, assume that £ ^* '' contains all the edges for 
which £^ < £ max ■ Clearly, t '*^ then contains all edges 



■2,J 






for which X^'] > ~tiiJ^/{l + tL^^). Similarly, 



£ '^*' contains all edges (i, j) for which e 



^(t) 



< £ 



tr(*-l) 



'^Given the difference in the graphical model in [18] and the graphical 
model considered here, some care is required when comparing the temperature 
that is mentioned here and the temperature that is mentioned in Sections II 
and III. 
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If Emax = then the lemma is clearly true. So, assume 

(t-i) _ - ^, 



that¥ 



> 0. Let E ^* ^^ contain all edges (i, j) for which 
^ i,j < 'Emax ■ The assumptions in the lemma statement 
guarantee that there is at least one such edge, namely the 
edge(s) (ijj) for which Sij = 0, and so the set E (*^^' 
is non-empty. It can then be verified that four half-iterations 
later we have E '*+^-' = E. 

The fact that there is, as mentioned in the lemma statement, 
a constant v' that is t-independent and strictly smaller than 1 
is then established by the tracking the differences between 
the left- and the right-hand side in the above-mentioned strict 
inequalities. This is done with the help of (31) and (33) D 

The convergence proof is then completed by applying 
Lemma 67 repeatedly. One detail needs to be mentioned, 
though. Namely, if minij el.j > 0, and a non-trivial 
re-gauging occurs at the beginning of the next application 
of Lemma 67, then in this re-gauging process the value 
of maxij E ij- > never increases (in fact, it always 
decreases). 

Finally, we have 



exp -Fj 



?* 

Bcthc 



{^?}'{^'}) -p--bW 



i^C-' 



for suitable constants C,iy E K>o- This follows from, on the 
one hand, the fact that when Fb achieves its minimum in the 
interior of r„xn then we have equality between Fb and Fg^^j^^, 
at stationary points of the SPA [9], and, on the other hand, the 
above convergence analysis. 

Appendix H 
Proof of Lemma 48 

In a first step we evaluate pcrm(l„xn)- Namely, we obtain 

V 27rn • 

Vc- 

where at step (a) we have used Stirling's approximation of n\. 
In a second step we evaluate pcrmB(l„xn)- From Defini- 
tions 11 and 12 it follows that 



perm(l„xn) 



7 I 



(1 + 0(1)), (37) 



pcrmB(l„x«) = cxp ( -minFB(7) 

From Corollary 23 and symmetry considerations it follows that 
the minimum in the above expression is achieved by jij = 
1/n, {i,j) G I X J'. Therefore, 

log(pcrmB(l„x«)) 
= -^b(7)| 



l7i.j=l/n, (i,j)elxj 



(a) 



C/b(7)+^b(7) 



7i,j = l/n, {i,j)elxj 
1 



(b) 2 1 , f^ 

= —71 • — • log — 

n \ n 



n ■ log(n) + n ■ (n — 1) ■ log 1 

' n 



1 -log 1-- 



= n ■ log(n) + n • (71 - 1) • ( TT^ + o [ -^ 

' 71 2?!^ V 71^ 



71 • log(n) — [n — 1) 



71-1 
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0(1) 



71 • log(n) - 71 + - + o(l), 



where at steps (a) and (b) we have used Corollary 15. 
Consequently, 

'71^ " 



permB(l„xn) = %/e • 



^1 + 0(1)). 



(38) 



Combining (37) and (38) we obtain the promised result in 
the lemma statement. 

Appendix I 
Proof of Conjecture 51 for = l„xn 
Let 6 = Inxn- In this appendix we prove that for any 
M e Z>o and any P e ^m it holds that 



perm ( 9^^] sC (perm(6>))^ 



(39) 



Although the proof is somewhat lengthy, the combinatorial 
idea behind it is quite straightforward. Moreover, the only 
inequality that we use is the AM-GM inequality, which 
says that the arithmetic mean of a list of non-negative real 
numbers is at least as large as the geometric mean of this 
list of numbers. Notably, there is no need to use Stirling's 
approximation of the factorial function. 

Towards showing (39), let us fix some positive inte- 
ger M, fix some collection of permutation matrices P = 
|P(*^J)|. ^ . ^e $M, define 6 = G^^ as in Definition 37, 

and let the row and column index sets of 0^^ be I x [M] and 
J^ X [M], respectively. With this, it follows from Definition 1 
that 



perm(0) = ^n' 



2,(j('i) 7 



(40) 



zGl 



perm(0) = ^ Y[ ^(,;,m),5((j,m)), (41) 

CT (i,m)eIx[M] 

where a ranges over all permutations of the set I and where 
a ranges over all permutations of the set I x [AI]. 

Note that, because all entries of are either equal to 
zero or to one, the products in (41) evaluate either to zero 
or to one. Computing perm(0) is therefore equivalent to 
counting the 5's for which these products evaluate to one. 
Equivalently, pcrm(0) equals the number of perfect matchings 
in the NFG N{9). 

Example 68 Some of the steps of the proof will be illustrated 
with the help of the NFGs in Figure 3 (which are reproduced 
in Figure 10 for ease of reference), where n = 3 and M = A. 

• Figure 10(a) shows the NFG N(0); pcrni(0) equals 
the number of perfect matchings in Figure 10(a). Note: 
pcriii(0) = 7i!. 

. //P = {P(-^)},,^,^.,^ = {l}^eT,,eJ' ^hereiis 
the identity matrix of size M x M, then we obtain the 
M -cover shown in Figure 10(b), which is a "trivial" 
M -cover of N(0); perm (©^■'^j equals the number of 
perfect matchings in Figure 10(b). Note: perm (0^^) ~ 
(perm(0))*^ = (ii!)*^. 

• For a "non-trivial " collection of permutation matrices 
P = |P'-''-'H ,__ .^_ we obtain an M-cover like in 

Figure 10(c); perm (0^ ) equals the number of perfect 
matchings in Figure 10(c). D 
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Let i ~ 2. Then d-^,~i-i, m £ [A/], is the number of 
possibilities of choosing the edge of the perfect matching 
of N{6) that is incident on {i,m). Because the i-th row 
of 6 contains only ones, because of the above partitioning 
observation, and because of the observation at the end of 
the above step, we find that 



/ ^ i,m\crl 
■m£[M] 



s^M -{n- 1). 



(42) 



Fig. 10. (a) NFG N(0) for n = 3. (b) "Trivial" 4-cover of N(0) (c) 
A possible 4-cover of N(0). The coloring of the edges in (b) and (c) show 
visually the fact that he sets d{(i, m)), m g [M], form a partition of J7 X [M] 
(here for i = 1). (For more details, see the text in Appendix I). 



Let us therefore count the number of perfect matchings in 
N(0), cf. Figure 10(c). Before continuing, we define d{{i, m)), 
(i, m) £ Ix [M], to be the set of neighbors of the vertex {i, m) 
in N{e), i.e.. 



diii,m))^[U,m')eJx[M] 



5('J) 



1 



One can easily verify that for every i El, the sets d{{i,m)), 
m g [M], form a partition of J' x [AI]. (See Figures 10(b)- 
(c) that highlight this partitioning for i = 1.) This observation 
will be the crucial ingredient of the following steps. 

We count the number of perfect matchings in N(0) by 
considering the vertices {(i, m)} ,. ., for i = 1, i = 2, up to 
i — n, thereby counting in how many ways we can specify a 
such that the product in (41) equals one. Note that because of 
the above partitioning observation, we can, conditioned on the 
selection of a perfect matching up to and including step j — 1 
(which we shall symbohcally denote by ^l^^), consider the 
vertices {{i,m)^ , , independently. Then we define 

d- ,~i-i, (i,to) e I X [A/l, 

to be the number of possibilities of choosing a{{i,Tn)), i.e., 
the number of ways that the edge of the perfect matching of 
N(0) that is incident on {i,m) can be chosen. 

• Let i = 1. Then d- ^i-i-i, m G [Ml, is the number of 
possibilities of choosing the edge of the perfect matching 
of N{0) that is incident on {i,m). Because the i-th 
row of 6 contains only ones, and because of the above 
partitioning observation, we find that di^„i = n for all 
m e [7\/], and so. 






2,m.|cr^ 



= Mn. 



We observe that, whatever the selection of these M edges 
is, M vertices on the right-hand side will be incident on 
a selected edge, and therefore be "not available anymore" 
in the following steps. This reduces the number of "avail- 
able" right-hand side vertices to AIn — M = AI ■ (n — l). 



(If all permutation matrices in P are identity matrices, 
then it can be verified that the inequality in (42) is 
an equality. However, for general P, equality in (42) 
does not need to hold.) Similar to the end of the above 
step, we observe that whatever the selection of these 
AI edges is, AI vertices on the right-hand side will 
be incident on a selected edge, and therefore be "not 
available anymore" in the following steps. This reduces 
the number of "available" right-hand side vertices to 
M • (n - 1) - M = M • (n - 2). 

Continuing as above, we observe that for general i E I 
it holds that 



me[J\/] 



Note that for i G I we have 



s^ Af -(n-z + l). (43) 



M 



11 ^^,m\y^~ 



n ^"'r- 

nelM] 



< I — V d 

me [A/] 
(b) / 1 

^ A/- (n-i + 1) 

\AI ^ ' 



M 



M 



{n-i + lY\ 



(44) 



where at step (a) we have used the fact that the geometric mean 
of a collection of non-negative numbers is upper bounded by 
the arithmetic mean of the same collection of numbers, and 
where at step (b) we have used (43). 

With this, we obtain the following upper bound on pcrm(0). 
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Namely, 



ai erf <T| 



(b) 



(d) 



O-j |<Tj (Tj ICTj 

E E ■■■ E 11 ^n.m^l?;^'- 

EE-- E in^n+ir 



W(n-n + l)-.EE--- E 1 



<?? 5?i5? sr'is^r^' 



From this and Corollary 15 it then follows that 

permB(e') ^ (pcrmB(0)) . 

Towards proving the second inequality, let 7 e r(7v//„) ^ (A/n) 
be a matrix that minimizes Fg |^.g,. One can easily verify 

that 7(»,m),o,rn') = whenevci^ P^*;^|, = 0, {i,m,j,m') e 
I X [M] X J7 X [A/] . Based on 7, we define the nx n matrix 
7 with entries 



'1 "ii"i 



7*,. 



j^mEE^c 



i,m),(j,m') ^ m.m' 






\M 



(n!) 



M 



(f) 



perm(0) 



M 



where at step (a) we have used the fact that perm(0) equals 
the number of perfect matchings in N(0), where at step (b) 
we have used the definition of d ^i-n-i, where at step (c) we 
have used (44) for i ~ n, where at step (d) we take advantage 
of the fact that ( n — 77, + 1 ) ^^ is independent of ct" ^ ^ , where at 
step (e) we apply similar results as at steps (b)-(d) (note that 
for all i, the quantity (ri — i + 1)^^ is independent of ct^^"^), and 
where at step (f) we have used the observation pcrm(0) = nl. 
This shows that the desired inequality (39) indeed holds for 
arbitrary positive integer M and P e ^m- 

Appendix J 
Proof of Lemma 54 

We first prove permg (S''^-'^) ^ (pcrmB(0)) and then 
peim^(^6^^) ^ (pcrmB(0)) , from which the promised 
equality follows. 

For the rest of the proof, we will use the short-hand 
6 for 9^^ and we will assume that there is at least one 
permutation cr : [n] — > [n] such that J|,- 0i_cr(i) > (otherwise, 
permB(0) = permB(0) = 0). Moreover, N(0) will be the 
NFG associated with 0.'^ 

Towards proving the first inequality, let 7 e r„xri be a 



matrix that minimizes Ft 



B,N(6()- 



Based on 7, we define the 



{Mn) X (Mn) matrix 7 with entries 

~ A p 

7(i,m),(j,m') — 7i,i ' -< r 



(«J) 



for all {i,m,j, m!) e Ix [A/] x J'x [A/]. One can easily verify 
that 7 e T(Mn)x(Mn) and that FB,N(e)(7) = ^^ ' -pB,N(e)(7)- 

'^Let N be the M-cover of N(0) corresponding to P. Note that, strictly 
speaking, N and N(0) are not the same NFG. The former is an A/-cover 
of N(0) (therefore it has two times Mn function nodes, all of them with 
degree n), whereas the latter is a complete bipartite graph with two times 
Mn function nodes. However, with the above condition on 0, for all practical 
purposes they are the same because i^„ kiq']^) ^ °° ""^^y ^"'^ matrices 

7 e r(j(/„)x(j\/„) for which 7(i,m),{i,m') 
{i,m,j,rn') G X X [M] X J X [M]. 



whenever P 



(«,i) 



0, 



for all {i,j) e IxJ. One can easily verify that 7 e r„XTi- Let 
7(i,m) be the length-n vector based on the (i, m)-th row of 7, 
where we include an entry only if P,„ ,„/ = 1- Similarly, define 
the length-n vector Ji^j^m') based on the (j, m')-th column 
of 7. One can verify that the i-th row of 7, i.e., ji, equals 

7(4, m)- Similarly, the j-th column of 7, i.e., jj, equals 

7(j,m')- Then 



M 1^ 

17 A^ 



^B,N(e)(7) *=^* ^EE^(T(^"0) + 9 EE^(T'W.™')) 



(b) Af 



M ■ 



WtE^(^o + ^E^(7.) 



(c) 



A/-ifB.N(e)(7) 



where at step (a) we have used Lemma 21, where at step (b) 
we have used the concavity of the 5-function (cf. Theorem 20), 
and where at step (c) we have used once again Lemma 21. 



Moreover, one can easily show that t/j 

t^B,N(e)(7). and so FB.N(e)(7) ^ ^^ 
and Corollary 15 it then follows that 



B,N(6I) 



(7) 



M 



^B,N(e)(7)- From this 



pcrmB(0) ^ (pcrmB(0)) 



M 



Appendix K 
Proof of Lemma 61 

Because k satisfies the conditions listed in Theorem 60, the 
concavity statement for the Bethe entropy function and the 
convexity statement for the Bethe free energy function follow 
immediately. 

Therefore, let us turn our attention to evaluating the ratio 
pGrm(l„xn)/permg (l„xn)- In a first step we evaluate 
pcrm(l„xn,)- Namely, as in Lemma 48, we have 



perni(lf 



(45) 



In a second step we evaluate pcrmg ^(l„x?i)- From The- 
orem 60 and symmetry considerations it follows that the 
minimum in the above expression is achieved by ji,j = l/n. 
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{i,j) el X J . Therefore, 
log(pcrmB(l„xn)) 

-C/b(7)+4'^^(7) 



(b) 



1 



n \n 



1-- 



1 



1 

2n 
1 
' 2n 



■ log(n) + ( n - - ) • (n - 1) • log 



log 1- 



1- 



log(n) 








■(n- 


1) 


( 1 


1 

2n2 


l0g(7l) 


— n 


+ 1 + 


o(l), 



(«)/ 



rM, 



where at step (a) we have used Fg (7) = C/b(7) ^ ^b (')')' 
where at (b) we have used C/b(7) = — X^i 7 7s j log(^i.j) = 
and the expression for ffg (7) from Lemma 58. Therefore, 

pcrmJ3'*^(l„x«) =c- 



(2)". (1+0(1,). 



(46) 



Combining (45) and (46) we obtain the desired result. 
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